Segmentation Fault whenever running the model #117

faysalhossain2007 closed 2 years ago

faysalhossain2007 commented 2 years ago

Whenever I try to run the following code, it ends in segmentation fault

`$ python \

--do_train \
--do_eval \
--model_type roberta \
--model_name_or_path $pretrained_model \
--config_name roberta-base \
--tokenizer_name roberta-base \
--train_filename ../data/,../data/ \
--dev_filename ../data/,../data/ \
--output_dir $output_dir \
--max_source_length 512 \
--max_target_length 512 \
--beam_size 5 \
--train_batch_size 32 \
--eval_batch_size 32 \
--learning_rate 5e-5 \
--train_steps 100 \
--eval_steps 50
Segmentation fault (core dumped)

My transformers version is

>>> transformers.__version__
>>> torch.__version__
celbree commented 2 years ago

I think Segmentation fault is a complex error that might be caused by many reasons. One possible reason is that your pytorch version is too old to support the newest transformers. (We will also update the README, it should be pytorch>=1.4.0)

faysalhossain2007 commented 2 years ago
>>> torch.__version__

python -c "import torch; print(torch.__version__)"

shouldn't my version of pytorch compatible with the requirement as it is 1.4 ?

celbree commented 2 years ago

I think transformers needs its compatible pytorch. For example, if you use pytorch==1.4.0, you'd better use older version transformers like 2.5.0, which is our running environment when we created this repo. Now, if you have used transformers==4.18.0, I recommend newer pytorch version like 1.10.0. Since we also don't know the exact dependency between transformers and pytorch, you can try different settings.

faysalhossain2007 commented 2 years ago

still no luck. I tried with transformers 2.5.0 (installed via pip, as this specific version of transformers is not available in conda). But got the segmentation fault.

Can you tell me the complete dependent packages along with the version using which you were successful in running the code?

conda list 
# packages in environment at /home/faysal/anaconda3/envs/codexglue:
# Name                    Version                   Build  Channel
celbree commented 2 years ago

I have attempted to use python3.6, pytorch==1.4.0 and transformers==2.5.0 and run successfully in my machine.

04/22/2022 01:50:01 - INFO - transformers.modeling_utils - loading weights file from cache at /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349 Segmentation fault (core dumped)

Based on the place your error happens, I don't know if it is the problem of transformers. Maybe you can raise an issue in their repo.

faysalhossain2007 commented 2 years ago

Thanks! I have created the issue :

nativexie commented 1 year ago

I encounter same problem, after trial and error, I solved. hardware: NVIDIA A100-PCIE-40GB Primary package and version: 1、NVIDIA Driver 525.85.12 CUDA Version 12.0

2、python 3.6.13

3、pytorch 1.10.1 # CUDA 11.3 conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge ref -->

4、transformers 4.18.0