hardmaru / WorldModelsExperiments

World Models Experiments
608 stars 171 forks source link

Out of memory when running extract.bash due to multiple extract.py using DoomTakeCoverWrapper #16

Open xiaoschannel opened 5 years ago

xiaoschannel commented 5 years ago

I am able to run a single instance of extract.py. But when i run extract.bash, it causes an out of memory error.

2019-01-17 03:20:10.196729: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 9.98G (10713064960 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY ... (this goes on for a while) ... 2019-01-17 03:20:10.208612: E tensorflow/stream_executor/cuda/cuda_driver.cc:936] failed to allocate 534.69M (560665856 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

My best guess, after having a look at the code, is it runs multiple workers to make data generation faster, which creates multiple tensorflow instances because extract.py is using DoomTakeCoverWrapper rather than DoomTakeCoverEnv.

From the perspective of replicating paper itself, this does not seem to make a lot of sense since a model is not needed in data generation since it is using the pretraining scheme.

Although I can see, since you are planning to switch to the iterative scheme, which would use a trained model to get better samples for the next iteration, this would be useful.

Adding a "export CUDA_VISIBLE_DEVICES=""" would prevent this from happening without removing the potential to upgrade to an iterative scheme later. Should I make a pull request?

hardmaru commented 5 years ago

Sure, feel free to streamline the code if it is more memory efficient. I'd be happy to accept any PR