Closed xiapengchng closed 5 years ago
Out of memory issue was tracked in #8.
BTW, does the problem still happen in the current master branch? I added two commits 6 hours ago.
Thank you for your quickly reply! with the current master branch, the batch_size=6 will reproduce the same error, when batch size changes to 4, every thing works fine. ps: Is it possible to train with multi-gpu
It is not possible with the released version right now. We have a multi-gpu version but it is not stable enough to be released. Also, ShanghaiTech is a relatively small dataset. It only takes several hours to get resonable result.
progress | sum | jmap | lmap | joff | lpos | lneg | speed Running validation... Traceback (most recent call last): File "./train.py", line 180, in
main()
File "./train.py", line 172, in main
trainer.train()
File "/mnt/lustre/xiapengcheng/LSD/lcnn/lcnn/trainer.py", line 290, in train
self.train_epoch()
File "/mnt/lustre/xiapengcheng/LSD/lcnn/lcnn/trainer.py", line 202, in train_epoch
self.validate()
File "/mnt/lustre/xiapengcheng/LSD/lcnn/lcnn/trainer.py", line 121, in validate
result = self.model(input_dict)
File "/mnt/lustre/share/spring/envs/r0.3.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/mnt/lustre/xiapengcheng/LSD/lcnn/lcnn/models/line_vectorizer.py", line 84, in forward
I try to reproducte the paper result, I use 1080TI, it 's works fine with training, but went wrong in validate, I have changed the batch size to 1. and I have even try to use V100(16G) but the same error