neulab / guided_summarization

GSum: A General Framework for Guided Neural Abstractive Summarization
MIT License
113 stars 27 forks source link

Training stuck at step 72900/200000 #5

Open bhuvanakundumani opened 3 years ago

bhuvanakundumani commented 3 years ago

Hi,

I noticed that the training on CNN dataset gets stuck at step 72900/200000. However the GPU utilization shows 100%. I tried training 3 times. But every time I am getting stuck at the same step. I tried different datasets and the training gets stuck at the same step.(with GPU utilization at 100%). Have attached the image here for reference. Need your inputs regarding this. Thanks

GSUM
zdou0830 commented 3 years ago

Hi, thanks for opening the issue! This is a problem with the PreSumm code (https://github.com/nlpyang/PreSumm/issues/135). One workaround is to reload checkpoint-72000.pt and resume training.

bhuvanakundumani commented 3 years ago

Thanks @zdou0830

maheshmylavarapu0057 commented 3 years ago

hi @bhuvanakundumani ,can i know how are you giving data-path,I tried different varites,but every time it is taking only bert_output/cnndm,train.0.pt. In between bert_output is my output directory

gaozhiguang commented 3 years ago

Hi, i follow the step, but my acc is too small, image Can i know how do you run it @bhuvanakundumani

bhuvanakundumani commented 3 years ago

hi @gaozhiguang, you should probably check your input data. I tried it on biomedical data and it worked fine. thanks

gaozhiguang commented 3 years ago

Thanks @bhuvanakundumani

git-ekeh commented 2 years ago

hi @bhuvanakundumani and @zdou0830 i'm sure you've moved on by now. However, I was wondering if you remember how you got the model to run. I am having issues getting stuck on the first training example. My question is how did you get the model to continue passed the cnndm.train.0.bert.pt and move on to the next files in the data_path directory. I'm currently getting an EOFError: Ran out of input error.