weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
498 stars 170 forks source link

Memory soon used up when running train step #24

Closed xray1111 closed 5 years ago

xray1111 commented 6 years ago

After running 'make mjsynth-download' and 'make mjsynth-tfrecord', I went to the 3rd step to train the model by running 'make train', but the machine's memory(32G) was soon used up in 2 secends and the host hang and restarted. What's the possible cause of this issue?

weinman commented 6 years ago

I'm not entirely sure; I've never had memory issues except when I set the batch size too large. I've only tested on GPUs with 12GB or less, but never had a problem with the default parameters. You might need to do some diagnostics to determine where in the pipeline the memory bloat is happening. (I'm not entirely sure how to do that in tensorflow.)

Also, the master branch is only tested through TF 1.2, though I am working to update for the latest TF.

weinman commented 6 years ago

@xray1111 if 32G is your HOST memory (not GPU), that's not enough for the default parameters (my system loads up a python process using 0.548t of total virtual memory and 3.396g of resident memory). You could try using a smaller batch size and number of input readers.

I'm not sure it would help, but I've updated master to use tensorflow 1.8, so you could try that.