Using filename diff per commit could dramatically reduce JSON file size

Around 80% of the JSON is made up of all_files within each commit — the list of all filenames at that commit. This scales poorly: O(|files| * |commits|).

Instead, per each commit, we can say what files were added and removed from the last sequentially occurring commit. The sequence we chose is arbitrary, but ideally, it would minimize the size of the diff every time. Iterating by commit date works well until we deal with branches, but it's probably not a big deal.

Showin' ma work

$ http :57442/projects/antlr4/get_project type==Types | json > parsed.json                                                                                          
$ json dates < parsed.json | wc
  292127  467739 7479044
$ <parsed.json json commits | wc   
 1238030 1301928 80724731
$ <parsed.json json commits | json -a all_files | wc
 1194860 1194860 73953759
$ wc parsed.json
 1532796 1772460 91378688 parsed.json
$ dc
100 73953759 * 91378688 / p
80

mdfeist / TypeV

Using filename diff per commit could dramatically reduce JSON file size #39

Showin' ma work