Memory usage when training on CPU vs GPU

google / seq2seq

A general-purpose encoder-decoder framework for Tensorflow

Apache License 2.0

5.61k stars 1.3k forks source link

I noticed (by checking top on linux) that when I train a model on the CPU, the CPU is used quite heavily (which is normal, of course) while the memory usage only goes up to 3-5 % and fluctuates a lot.

When training on GPU, however, the memory usage of the CPU rises exponentially (I'm training a small model now on 2.000.000 training sentences with about 600.000 training steps and batch size 32; my CPU memory usage has gone up to about 60% already while I'm only at training step 300.000 or so). Furthermore, the memory doesn't seem to 'flush' during training, which results in the model occupying all memory.

Although this hasn't resulted in any errors yet, I am afraid that there will be some kind of memory error once the CPU memory gets around 100%. Does anyone know if this is normal behaviour or what could cause this?

Thanks in advance

google / seq2seq

Memory usage when training on CPU vs GPU #212