Closed adinapoli-mndc closed 5 years ago
The Rubygem dataset is now fully processed.
The data
folder is now 39MB big, which is perhaps enough to warrant moving it into a separate folder, rewriting history for this repo so that it can be cloned swiftly.
For now, though, I don't think this is a burning issue, maybe only in the future if this starts to become taxing on things like CI etc.
This is incomplete for 2 reasons:
importers::csv
chokes as it consumes 13GB of ram before being killed by the OS. The problem is that the matrix transposition ops we have convert the matrixes into dense ones, and this doesn't scale. I am working on it.