hardmaru / WorldModelsExperiments

World Models Experiments
608 stars 171 forks source link

MemoryError in vae_train.py #20

Open Chazzz opened 5 years ago

Chazzz commented 5 years ago

Running python vae_train.py prompts a memory error on my system. I felt bad about this, but after running the numbers, vae_train.py needs to allocate ~125 GB of memory to this array!

>>> import numpy as np
>>> M = 1000
>>> N = 10000
>>> data = np.zeros((M*N, 64, 64, 3), dtype=np.uint8)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError
xiaoschannel commented 5 years ago

Hmm, this looks like #19. I am trying the solution suggested there. Thanks for crunching the numbers, I had a measly 16gigs when it happened to me.

Chazzz commented 5 years ago

@zuoanqh Not tremendously surprising that memory limitations are present in both experiments. A more dynamic loading would probably fix both issues.

Chazzz commented 5 years ago

@zuoanqh Not sure how far you got on this, but I have memory-free loading (not including training) at 1.25 hours (8 mins per epoch) in my fork's atari/vae_train.py. I convert the episodes into uncompressed (10x), individual images (100x), which are then loaded in parallel (10x) before being fed into tensorflow. Also being in black and white (atari only) is another 3x performance improvement which doesn't convert to doom and carracing. The only faster alternative I can think of is to convert to BMP and get tensorflow to manage the entire batching process using parallel prefetching.

Note that 10M uncompressed frames is about 80GB for single channel and 240GB for tri-channel images and takes several hours. VAE training (not including loading) takes about 5 hours on my system.

xiaoschannel commented 5 years ago

@Chazzz my experiment requires transitions rather than frames, so that's taking a bit more time to upgrade without doubling disk/memory usage -- i got it to work with about 1k episodes though...

Chazzz commented 5 years ago

@zuoanqh Yikes that's a lot of channels, then again you don't really need 10k episodes unless you're creating a baseline.