RL training stage, out of memory

luo3300612 / image-captioning-DLCT

Official pytorch implementation of paper "Dual-Level Collaborative Transformer for Image Captioning" (AAAI 2021).

BSD 3-Clause "New" or "Revised" License

193 stars 31 forks source link

RL training stage, out of memory #13

Open TBI805 opened 3 years ago

TBI805 commented 3 years ago

Switching to RL, after every epoch training, the memory usage will increase about 2GB. Finally, the system will be out of memory, and kill process. Could you please give me some tips to solve it? Thank you very much!

amazingYX commented 2 years ago

Hello, have you ever solved this problem? Before switching to RL, I use Xe loss function to train model , the memory usage will increase.

aichisuancai commented 2 years ago

Switching to RL, after every epoch training, the memory usage will increase about 2GB. Finally, the system will be out of memory, and kill process. Could you please give me some tips to solve it? Thank you very much!

Hello,I meet the same problem. Have you ever overcome it ? Could you please tell me how to solve this problem, thank you !

haijie945 commented 2 years ago

Have you solved it? I hope to receive your reply as soon as possible. Thank you

Baixiaobai201619707 commented 2 years ago

Hello, have you ever solved this problem? Before switching to RL, I use Xe loss function to train model , the memory usage will increase. Hello, excuse me. I have the same problem, have you solved it? My cache keeps increasing when I run it and eventually the model gets stuck，thanks a lot.