Some simulations don't finish

maciej-sypetkowski / autoascend

The first place solution for the NeurIPS 2021 Nethack Challenge -- https://www.aicrowd.com/challenges/neurips-2021-the-nethack-challenge

MIT License

54 stars 15 forks source link

Some simulations don't finish #5

Open jens321 opened 1 year ago

jens321 commented 1 year ago

Hi,

When running the following command python3 ./bin/main.py simulate -n 100 --panic-on-errors --no-plot, the code seems to hang on some of the last simulations (e.g. it finishes around 98 of them but then hangs). From what I can tell, it seems to hang on this line https://github.com/maciej-sypetkowski/autoascend/blob/master/bin/main.py#L309. Is this a known issue? Happy to dig a bit deeper but wanted to check here first.

Thank you!

maciej-sypetkowski commented 1 year ago

Are you sure it's hanging? The code runs episodes in parallel and reports results as soon as it gets them, so it may seem that it's hanging but in reality last reported episodes are just the longest ones. There's a huge difference in execution time between mean/median and 99 percentile. You may need to wait even around 1 hour for longest episodes depending on your CPU -- farming in Gnomish Mines takes a long time when the agent cannot find a way out :)

jens321 commented 1 year ago

Thanks for the reply! That's a good point. I tried again and the 99th run seems to have been running for 3hours+ now. Is that still normal? I do wonder if it has to do with the version of NLE, since I adjusted the docker file to run with 0.9 instead of 0.7.3.

maciej-sypetkowski commented 1 year ago

I'm not sure, it's possible that updating NLE to the newer version causes things to break (e.g. infinite loop or something). We never tried that.

jens321 commented 1 year ago

Have you ever encountered segmentation faults when running the code? I sometimes get this when performing a bunch of simulations.