jiweil / Neural-Dialogue-Generation

MIT License
829 stars 213 forks source link

out of memory #13

Open piekey1994 opened 6 years ago

piekey1994 commented 6 years ago

run:th train_atten.lua -train_file ../data/weibo.qa.train.txt -dev_file ../data/weibo.qa.dev.txt -test_file ../data/weibo.qa.test.txt -saveFolder ../result -dictPath ../data/word2num.weibo.qa -gpu_index 2

My GPU is a single K80 and my vocabulary is 25000. Then only 50,000 training sets, only 1000 test sets and dev sets. The remaining parameters remain default. In the second round ,out of memory

iter 0 perp 25028.443593628 iter 1 Fri May 11 10:16:20 2018 perp 516.06309105302 445.2495508194 iter 2 Fri May 11 10:23:46 2018 perp 316.19450563604 441.42551279068 iter 3 Fri May 11 10:31:07 2018 THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6971/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /opt/torch/install/bin/luajit: /opt/torch/install/share/lua/5.1/cutorch/Tensor.lua:14: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-6971/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'resize' /opt/torch/install/share/lua/5.1/cutorch/Tensor.lua:14: in function 'cuda' .../liupq/Neural-Dialogue-Generation-master/Atten/atten.lua:370: in function 'model_backward' .../liupq/Neural-Dialogue-Generation-master/Atten/atten.lua:573: in function 'train' train_atten.lua:11: in main chunk [C]: in function 'dofile' /opt/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00406670

yoyoyanyu commented 5 years ago

I have the same problem.And use "nividia-smi -l "to see the GPU memory,I found it continually increases during simulation.I suspect that, torch is not freeing up memory from one iteration to the next and so it ends up consuming all the GPU memory available.But I don't know how to do .