microsoft / Oscar

Oscar and VinVL
MIT License
1.04k stars 252 forks source link

Segmentation fault (core dumped) When I fintune VQA task #39

Open yichao96 opened 4 years ago

yichao96 commented 4 years ago

Hi~ When I finetune VQA task, there is an error "Segmentation fault (core dumped)" The reason that my memory is 128G which is not enough , Could you give me some suggestion?

weiyx16 commented 4 years ago

Hi~ When I finetune VQA task, there is an error "Segmentation fault (core dumped)" The reason that my memory is 128G which is not enough , Could you give me some suggestion?

hallo! I tried to reproduce the result using 4*32G node, and it works. I think this is not caused by memory capacity which usually raise OOM problem.

vyskocj commented 3 years ago

The problem was solved, according to the issue (https://discuss.pytorch.org/t/segmentation-fault/23489).

You have to:

  1. install a newer version of the torch conda install pytorch==1.7.1 torchvision==0.8.2 cudatoolkit=10.1 -c pytorch

And if you followed the installation guide (https://github.com/microsoft/Oscar/blob/master/INSTALL.md), you also need to:

  1. remove the build directory for the Oscar and Apex cd Oscar or cd apex rm -rf build/
  2. install the Oscar and Apex again Apex: python setup.py install --cuda_ext --cpp_ext Oscar: python setup.py build develop