Open eddieantonio opened 8 years ago
Yeesh! I re-did the calculation using the raw JSON instead of the parsed file, and I got that the filenames used 92% of the entire JSON file! That's like... 70.46 MiB (73953759 bytes)!
$ dc -e "100 $(<raw.json json commits | json -a all_files | wc -c) * $(cat raw.json | wc -c) / p"
92
Around 80% of the JSON is made up of
all_files
within each commit — the list of all filenames at that commit. This scales poorly: O(|files| * |commits|).Instead, per each commit, we can say what files were added and removed from the last sequentially occurring commit. The sequence we chose is arbitrary, but ideally, it would minimize the size of the diff every time. Iterating by commit date works well until we deal with branches, but it's probably not a big deal.
Showin' ma work