microsoft / CodeXGLUE

CodeXGLUE
MIT License
1.51k stars 363 forks source link

Segmentation Fault whenever running the model #117

Closed faysalhossain2007 closed 2 years ago

faysalhossain2007 commented 2 years ago

Whenever I try to run the following code, it ends in segmentation fault

`$ python run.py \

--do_train \ --do_eval \ --model_type roberta \ --model_name_or_path $pretrained_model \ --config_name roberta-base \ --tokenizer_name roberta-base \ --train_filename ../data/train.java-cs.txt.java,../data/train.java-cs.txt.cs \ --dev_filename ../data/valid.java-cs.txt.java,../data/valid.java-cs.txt.cs \ --output_dir $output_dir \ --max_source_length 512 \ --max_target_length 512 \ --beam_size 5 \ --train_batch_size 32 \ --eval_batch_size 32 \ --learning_rate 5e-5 \ --train_steps 100 \ --eval_steps 50 Segmentation fault (core dumped) `

My transformers version is

>>> transformers.__version__
'4.18.0'
>>> torch.__version__
'1.4.0'
celbree commented 2 years ago

I think Segmentation fault is a complex error that might be caused by many reasons. One possible reason is that your pytorch version is too old to support the newest transformers. (We will also update the README, it should be pytorch>=1.4.0)

faysalhossain2007 commented 2 years ago
>>> torch.__version__
'1.4.0'

python -c "import torch; print(torch.__version__)"
1.4.0

shouldn't my version of pytorch compatible with the requirement as it is 1.4 ?

celbree commented 2 years ago

I think transformers needs its compatible pytorch. For example, if you use pytorch==1.4.0, you'd better use older version transformers like 2.5.0, which is our running environment when we created this repo. Now, if you have used transformers==4.18.0, I recommend newer pytorch version like 1.10.0. Since we also don't know the exact dependency between transformers and pytorch, you can try different settings.

faysalhossain2007 commented 2 years ago

still no luck. I tried with transformers 2.5.0 (installed via pip, as this specific version of transformers is not available in conda). But got the segmentation fault.

  "architectures": [
    "RobertaForMaskedLM"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "do_sample": false,
  "eos_token_id": 2,
  "eos_token_ids": 0,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_eps": 1e-05,
  "length_penalty": 1.0,
  "max_length": 20,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_beams": 1,
  "num_hidden_layers": 12,
  "num_labels": 2,
  "num_return_sequences": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": 1,
  "pruned_heads": {},
  "repetition_penalty": 1.0,
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "torchscript": false,
  "type_vocab_size": 1,
  "use_bfloat16": false,
  "vocab_size": 50265
}

04/22/2022 01:49:21 - INFO - transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-vocab.json from cache at /home/faysal/.cache/torch/transformers/d0c5776499adc1ded22493fae699da0971c1ee4c2587111707a4d177d20257a2.ef00af9e673c7160b4d41cfda1f48c5f4cba57d5142754525572a846a1ab1b9b
04/22/2022 01:49:21 - INFO - transformers.tokenization_utils -   loading file https://s3.amazonaws.com/models.huggingface.co/bert/roberta-base-merges.txt from cache at /home/faysal/.cache/torch/transformers/b35e7cd126cd4229a746b5d5c29a749e8e84438b14bcdb575950584fe33207e8.70bec105b4158ed9a1747fea67a43f5dee97855c64d62b6ec3742f4cfdb5feda
04/22/2022 01:49:21 - INFO - filelock -   Lock 139774984058640 acquired on /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349.lock
04/22/2022 01:49:21 - INFO - transformers.file_utils -   https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/codebert-base/pytorch_model.bin not found in cache or force_download set to True, downloading to /home/faysal/.cache/torch/transformers/tmpexngjf6w
Downloading: 100%|███████████████████████████| 499M/499M [00:39<00:00, 12.6MB/s]
04/22/2022 01:50:01 - INFO - transformers.file_utils -   storing https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/codebert-base/pytorch_model.bin in cache at /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349
04/22/2022 01:50:01 - INFO - transformers.file_utils -   creating metadata file for /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349
04/22/2022 01:50:01 - INFO - filelock -   Lock 139774984058640 released on /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349.lock
04/22/2022 01:50:01 - INFO - transformers.modeling_utils -   loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/codebert-base/pytorch_model.bin from cache at /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349
Segmentation fault (core dumped)

Can you tell me the complete dependent packages along with the version using which you were successful in running the code? Right now, these are the packages that I installed for running this tool.

conda list 
# packages in environment at /home/faysal/anaconda3/envs/codexglue:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
certifi                   2021.5.30        py36h5fab9bb_0    conda-forge
cudatoolkit               10.1.243            h036e899_10    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libblas                   3.9.0           14_linux64_openblas    conda-forge
libcblas                  3.9.0           14_linux64_openblas    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 11.2.0              h1d223b6_15    conda-forge
libgfortran-ng            11.2.0              h69a702a_15    conda-forge
libgfortran5              11.2.0              h5c6108e_15    conda-forge
liblapack                 3.9.0           14_linux64_openblas    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.20          pthreads_h78a6416_0    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_15    conda-forge
libzlib                   1.2.11            h166bdaf_1014    conda-forge
llvm-openmp               13.0.1               he0ac6c6_1    conda-forge
mkl                       2022.0.1           h8d4b97c_803    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
ninja                     1.10.2               h4bd325d_1    conda-forge
numpy                     1.19.5           py36hfc0c790_2    conda-forge
openssl                   1.1.1n               h166bdaf_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
python                    3.6.15          hb7a2778_0_cpython    conda-forge
python_abi                3.6                     2_cp36m    conda-forge
pytorch                   1.4.0           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
readline                  8.1                  h46c0cb4_0    conda-forge
setuptools                58.0.4           py36h5fab9bb_2    conda-forge
sqlite                    3.38.2               h4ff8645_0    conda-forge
tbb                       2021.5.0             h924138e_1    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tokenizers                0.5.0                    pypi_0    pypi
transformers              2.5.0                    pypi_0    pypi
tree-sitter               0.20.0                   pypi_0    pypi
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zlib                      1.2.11            h166bdaf_1014    conda-forge
celbree commented 2 years ago

I have attempted to use python3.6, pytorch==1.4.0 and transformers==2.5.0 and run successfully in my machine.

04/22/2022 01:50:01 - INFO - transformers.modeling_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/microsoft/codebert-base/pytorch_model.bin from cache at /home/faysal/.cache/torch/transformers/3416309b564f60f87c1bc2ce8d8a82bb7c1e825b241c816482f750b48a5cdc26.96251fe4478bac0cff9de8ae3201e5847cee59aebbcafdfe6b2c361f9398b349 Segmentation fault (core dumped)

Based on the place your error happens, I don't know if it is the problem of transformers. Maybe you can raise an issue in their repo.

faysalhossain2007 commented 2 years ago

Thanks! I have created the issue : https://github.com/huggingface/transformers/issues/16939

nativexie commented 1 year ago

I encounter same problem, after trial and error, I solved. hardware: NVIDIA A100-PCIE-40GB Primary package and version: 1、NVIDIA Driver 525.85.12 CUDA Version 12.0

2、python 3.6.13

3、pytorch 1.10.1 # CUDA 11.3 conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge ref --> https://pytorch.org/get-started/previous-versions/

4、transformers 4.18.0