jackroos / VL-BERT

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Linguistic Representations".
MIT License
738 stars 110 forks source link

segmentation fault #32

Closed KechenQin closed 4 years ago

KechenQin commented 4 years ago

Thank you for the code. I want to fine tune this model on refcoco dataset. I got a segmentation fault error when I run the non-distributed sh file. Please help.

[Partial Load] non pretrain keys: ['final_mlp.2.weight', 'final_mlp.2.bias'] PROGRESS: 0.00% Segmentation fault

jackroos commented 4 years ago

@KechenQin You can try to reduce NUM_WORKERS_PER_GPU in the yaml file.

KechenQin commented 4 years ago

thanks for the reply!

I reduced NUM_WORKERS from 4 to 1, but I still got the same error. Basically when I run loss.backward(), I got this error. I am working with Tesla V100 gpu ((16G). Please let me know if there is any other idea.

jackroos commented 4 years ago

Could you provide more details about your environment, including system version, cuda version, python version, pytorch version, e.t.c? And how many V100 gpus are you used to run the code? Which config yaml do you use?

KechenQin commented 4 years ago

I am working with linux, conda virtual environment, cuda version 9.0, python3.6.5, I have 8 gpus in total, but I just tested VL-BERT with one gpu. I tried to use 4 gpus following the default setup, but I got the same error. I am using cfgs/refcoco/base_detected_regions_4x16G.yaml as config file.

btw, I did not install tensorflow in this environment and I did not see any dependency errors. I am not sure if that is the reason of this issue.

KechenQin commented 4 years ago

I got problem solved after using a different aws ami.