Closed NMerz closed 2 years ago
Currently processing, will merge later today assuming everything looks fine when done on my machine.
In hindsight this is a fairly lazy and inefficient way to accomplish this, but considering it only needs run once per machine, I'm not sure it is worth the time to rewrite. If I were to do so, for all files but amazon, I would mark the point in the file at which to keep and use that for either a delete of that section or a memory copy on the rest instead of iterating. This might necessitate a lower language, but it would be much faster.
Results form latest commit looks good. Should be ready for review (if you want) and then merge.
Perfect! I went through your code and added the file processing logic based on that (create adjacency list and stuff). you can take a look at that PR and lemme know if that looks good. Merging this one.
Remove any header lines and comments for easy, standardized parsing later.
Shrink the AGATHA data source since it is too large to fit in memory on our largest compute device