Open rebecca312 opened 6 years ago
I'm surprised it's so big. I didn't catalog file sizes, but I don't remember anything being even close to 1T in size. IIRC, I was able to store everything on a machine with only 500G. It's been a while though, so I may be misremembering.
You could try modifying the code that writes the file to compress it first. I think pandas supports writing in compressed formats via extra kwargs.
When I tried to run the pipeline, paper.csv was generated from Miner-Papertxt (about 2.2G). And the paper.csv file was too large (exceeded 1.7T) but my computer has only about 2T storage space. So it failed each time I run the project. Do you know how to fix this?