macks22 / dblp

Parse the dblp data into a structured format for experimentation.
MIT License
73 stars 22 forks source link

paper.csv is too large to save in my computer #20

Open rebecca312 opened 6 years ago

rebecca312 commented 6 years ago

When I tried to run the pipeline, paper.csv was generated from Miner-Papertxt (about 2.2G). And the paper.csv file was too large (exceeded 1.7T) but my computer has only about 2T storage space. So it failed each time I run the project. Do you know how to fix this?

macks22 commented 1 year ago

I'm surprised it's so big. I didn't catalog file sizes, but I don't remember anything being even close to 1T in size. IIRC, I was able to store everything on a machine with only 500G. It's been a while though, so I may be misremembering.

You could try modifying the code that writes the file to compress it first. I think pandas supports writing in compressed formats via extra kwargs.