hotpotqa / hotpot

Apache License 2.0
445 stars 75 forks source link

MemoryError still presists with n_jobs=3 #29

Open mominali12 opened 4 years ago

mominali12 commented 4 years ago

memoryerrorBaseline

qipeng commented 4 years ago

Try n_jobs=1?

mominali12 commented 4 years ago

Still not working... This time it gets stuck even earlier. Capture

I am trying to replicate the baseline model and for that purpose I need to get the files generated after pre processing. Is there any way anyone could share a link to those files so that I could proceed with the remaining steps. Thanks in advance

qipeng commented 4 years ago

@mominali12 not sure I'm seeing it in the screenshot -- warnings can be safely ignored, and if you're missing the ujson package, try pip install ujson

mominali12 commented 4 years ago

The system stops at Done 9 and won't progress any further. I believe that I should be getting a train_record.pkl file and this file is not created neither is there any progress in pre processing

qipeng commented 4 years ago

@mominali12 That's strange. Could you try running the code with a machine with more RAM?

AndresFRJ98 commented 4 years ago

I am experiencing a similar issue. I am running the preprocessing step on my local machine which has 16GB of RAM, and nothing is printed after 9 out of 9.

My memory consumption keeps growing reaching 100% too. What would you advise?

image

tbonus-neurosys commented 4 years ago

Same problem for me on linux machine with 32GB of RAM. Stuck after job 9 and RAM usage is going up until crash. image

qipeng commented 4 years ago

Most of our work is done on server machines with 64G or more memory, so it's possible that even larger RAM will help resolve the issue. But also if people are stuck at example 9, it might also be worth investigating into that particular example and skip/truncate if necessary.