Taking way too long for Training

santhoshkolloju / Abstractive-Summarization-With-Transfer-Learning

Abstractive summarisation using Bert as encoder and Transformer Decoder

406 stars 98 forks source link

Taking way too long for Training #31

Closed rmrfcurr closed 5 years ago

rmrfcurr commented 5 years ago

I am trying to train with 10k documents with additional 1k document for eval cycle.

Even for these small number of documents, it is projecting around 4 days of training time on Tesla M60 GPU.

I have changed config to have 10 docs per step with max steps to be 10000 for 10 epochs. It takes around 34 seconds per step, which gives us around 4 days of training time.

Am I doing something wrong?

rmrfcurr commented 5 years ago

Solved: Tensorflow wasnt using GPU. Creating new conda environment with tf_gpu worked.

Tesla-jiang commented 5 years ago

Did you encounter Batch X problem?