pandeyshubham25 / pagerank

0 stars 0 forks source link

Remove data headers and resize large data #3

Closed NMerz closed 2 years ago

NMerz commented 2 years ago

Remove any header lines and comments for easy, standardized parsing later.

Shrink the AGATHA data source since it is too large to fit in memory on our largest compute device

NMerz commented 2 years ago

Currently processing, will merge later today assuming everything looks fine when done on my machine.

NMerz commented 2 years ago

In hindsight this is a fairly lazy and inefficient way to accomplish this, but considering it only needs run once per machine, I'm not sure it is worth the time to rewrite. If I were to do so, for all files but amazon, I would mark the point in the file at which to keep and use that for either a delete of that section or a memory copy on the rest instead of iterating. This might necessitate a lower language, but it would be much faster.

NMerz commented 2 years ago

Results form latest commit looks good. Should be ready for review (if you want) and then merge.

pandeyshubham25 commented 2 years ago

Perfect! I went through your code and added the file processing logic based on that (create adjacency list and stuff). you can take a look at that PR and lemme know if that looks good. Merging this one.