Closed li-muz closed 10 months ago
Hi @li-muz ,
I apologize for the delay in responding. The following are my packages' versions:
transformers == 4.18.0
accelerate == 0.9.0
pandas == 1.1.5
numpy == 1.19.5
datasets == 2.3.2
sentencepiece != 0.1.92
protobuf == 3.19.4
spacy == 3.2.4
scispacy == 0.2.4
tensorflow-gpu == 2.6.2
If you got an error message, you could post it here. Thanks!
@ptlai I tried with the versions that you used. This is the error I get:
_pickle.UnpicklingError: invalid load key, 'v'.
Traceback (most recent call last):
File "src/utils/run_biored_eval.py", line 923, in <module>
labels = labels)
File "src/utils/run_biored_eval.py", line 884, in run_test_eval
labels = labels)
File "src/utils/run_biored_eval.py", line 189, in dump_pred_2_pubtator_file
pmids = sorted(list(pmid_2_rel_pairs_dict.keys()), reverse=True)_pickle.UnpicklingError: invalid load key, 'v'.
Traceback (most recent call last):
File "src/utils/run_biored_eval.py", line 923, in <module>
labels = labels)
File "src/utils/run_biored_eval.py", line 884, in run_test_eval
labels = labels)
File "src/utils/run_biored_eval.py", line 189, in dump_pred_2_pubtator_file
pmids = sorted(list(pmid_2_rel_pairs_dict.keys()), reverse=True)
Would love to get some help
Hi @berkekavak ,
We appreciate your interest in our work. The error message looks like it failed to generate the prediction files, resulting in a crash during evaluation. Can you check if you have the following files after finishing scripts/run_biored_exp.sh?
You should receive some error message while running "python src/run_biored_exp.py" in "scripts/run_biored_exp.sh". For example, did you put the PubMedBERT model at "biored_re/"
Po-Ting
Hi,
Thanks for the fast response. Unfortunately the specified files are not created after running the script. The model is located under the microsoft folder here: /Users/berkekavak/biored/biored_re/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract.
Here is the complete output:
(species3) ➜ biored_re git:(master) ✗ bash run_biored_exp.sh 0
in shell script task name: biored_all_mul run_biored_exp.sh: line 6: 84714 Illegal instruction: 4 cuda_visible_devices=$cuda_visible_devices python src/run_biored_exp.py --task_name $task_name --train_file $in_data_dir/train.tsv --dev_file $in_data_dir/dev.tsv --test_file $in_data_dir/test.tsv --use_balanced_neg false --to_add_tag_as_special_token true --no_neg_for_train_dev $no_neg_for_train_dev --model_name_or_path "${pre_trained_model}" --output_dir outmodel${task_name} --num_train_epochs 10 --learning_rate 1e-5 --per_device_train_batch_size 16 --per_device_eval_batch_size 32 --do_train --do_predict --logging_steps 10 --evaluation_strategy steps --save_steps 10 --overwrite_output_dir --max_seq_length 512 cp: out_model_biored_all_mul/test_results.tsv: No such file or directory in shell script task name: biored_novelty run_biored_exp.sh: line 6: 84721 Illegal instruction: 4 cuda_visible_devices=$cuda_visible_devices python src/run_biored_exp.py --task_name $task_name --train_file $in_data_dir/train.tsv --dev_file $in_data_dir/dev.tsv --test_file $in_data_dir/test.tsv --use_balanced_neg false --to_add_tag_as_special_token true --no_neg_for_train_dev $no_neg_for_train_dev --model_name_or_path "${pre_trained_model}" --output_dir outmodel${task_name} --num_train_epochs 10 --learning_rate 1e-5 --per_device_train_batch_size 16 --per_device_eval_batch_size 32 --do_train --do_predict --logging_steps 10 --evaluation_strategy steps --save_steps 10 --overwrite_output_dir --max_seq_length 512 cp: out_model_biored_novelty/test_results.tsv: No such file or directory Traceback (most recent call last): File "src/utils/run_biored_eval.py", line 923, in
labels = labels) File "src/utils/run_biored_eval.py", line 884, in run_test_eval labels = labels) File "src/utils/run_biored_eval.py", line 189, in dump_pred_2_pubtator_file pmids = sorted(list(pmid_2_rel_pairs_dict.keys()), reverse=True) AttributeError: 'NoneType' object has no attribute 'keys' datasets/biored/BioRED/Test.PubTator biored_pred_mul.txt datasets/biored/BioRED/Test.PubTator biored_pred_mul.txt datasets/biored/BioRED/Test.PubTator biored_pred_mul.txt datasets/biored/BioRED/Test.PubTator biored_pred_mul.txt
If you are available, we can do a short zoom session. It would be great since I worked a lot to run this experiment.
Best, Berke.
Hi @berkekavak ,
The below command appears to have failed.
cuda_visible_devices=$cuda_visible_devices python src/run_biored_exp.py ...
Did you modify "run_biored_exp.sh" ? If it is, could you post it here? Thank you!
I tried the code both on UNIX (max) and Linux. I did not modify it but I guess the main issue here:
I also attached the full error log
File
"/home/berkekavak/miniconda3/envs/species/lib/python3.6/site-packages/torch/serialization.py",
line 762, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
cp: cannot stat 'out_model_biored_novelty/test_results.tsv': No such file
or directory
Traceback (most recent call last):
File "src/utils/run_biored_eval.py", line 923, in
Help would be appreciated.
Berke.
On Tue, Jan 24, 2023 at 3:00 AM ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
The below command appears to have failed. cuda_visible_devices=$cuda_visible_devices python src/run_biored_exp.py ... Did you modify "run_biored_exp.sh" ? If it is, could you post it here? Thank you!
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1401168343, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNCYYPTFC2EQU4EBQI73WT4LRNANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
(species) @.***:/mnt/c/Users/berke/Documents/boun/biored/biored_re$ bash run_biored_exp.sh 1
2023-01-25 02:16:41.862711: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2023-01-25 02:16:41.862837: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
[INFO|training_args.py:804] 2023-01-25 02:16:44,767 >> using logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2023-01-25 02:16:44,767 >> PyTorch: setting up devices
[INFO|training_args.py:886] 2023-01-25 02:16:44,770 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2023-01-25 02:16:44,771 >> Tensorflow: setting up strategy
2023-01-25 02:16:44.772490: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-01-25 02:16:44.772599: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-25 02:16:44.772671: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-A21DDKP): /proc/driver/nvidia/version does not exist
2023-01-25 02:16:44.773617: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
01/25/2023 02:16:44 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
01/25/2023 02:16:44 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=
[INFO|tokenization_utils_base.py:1698] 2023-01-25 02:16:44,787 >> Didn't find file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/tokenizer.json. We won't load it. [INFO|tokenization_utils_base.py:1698] 2023-01-25 02:16:44,788 >> Didn't find file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/added_tokens.json. We won't load it. [INFO|tokenization_utils_base.py:1698] 2023-01-25 02:16:44,788 >> Didn't find file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/special_tokens_map.json. We won't load it. [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:44,788 >> loading file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/vocab.txt [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:44,788 >> loading file None [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:44,788 >> loading file None [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:44,788 >> loading file None [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:44,788 >> loading file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/tokenizer_config.json [INFO|configuration_utils.py:652] 2023-01-25 02:16:44,789 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:690] 2023-01-25 02:16:44,790 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|tokenization_utils.py:425] 2023-01-25 02:16:44,843 >> Adding @ChemicalEntitySrc$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:44,843 >> Adding @ChemicalEntityTgt$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:44,844 >> Adding @DiseaseOrPhenotypicFeatureSrc$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:44,844 >> Adding @DiseaseOrPhenotypicFeatureTgt$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:44,844 >> Adding @GeneOrGeneProductSrc$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:44,844 >> Adding @GeneOrGeneProductTgt$ to the vocabulary [WARNING|logging.py:279] 2023-01-25 02:16:44,844 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:652] 2023-01-25 02:16:44,845 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:690] 2023-01-25 02:16:44,846 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[WARNING|logging.py:279] 2023-01-25 02:16:44,880 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. =======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:652] 2023-01-25 02:16:44,883 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:690] 2023-01-25 02:16:44,884 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|modeling_tf_utils.py:1776] 2023-01-25 02:16:44,921 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:119] 2023-01-25 02:16:45,133 >> Loading PyTorch weights from /mnt/c/Users/berke/Documents/boun/biored/biored_re/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
Traceback (most recent call last):
File "src/run_biored_exp.py", line 795, in logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2023-01-25 02:16:51,246 >> PyTorch: setting up devices
[INFO|training_args.py:886] 2023-01-25 02:16:51,248 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2023-01-25 02:16:51,249 >> Tensorflow: setting up strategy
2023-01-25 02:16:51.250884: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2023-01-25 02:16:51.250988: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2023-01-25 02:16:51.251045: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (DESKTOP-A21DDKP): /proc/driver/nvidia/version does not exist
2023-01-25 02:16:51.252089: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
01/25/2023 02:16:51 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False
01/25/2023 02:16:51 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=0,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=
[INFO|tokenization_utils_base.py:1698] 2023-01-25 02:16:51,264 >> Didn't find file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/tokenizer.json. We won't load it. [INFO|tokenization_utils_base.py:1698] 2023-01-25 02:16:51,264 >> Didn't find file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/added_tokens.json. We won't load it. [INFO|tokenization_utils_base.py:1698] 2023-01-25 02:16:51,265 >> Didn't find file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/special_tokens_map.json. We won't load it. [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:51,265 >> loading file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/vocab.txt [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:51,265 >> loading file None [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:51,265 >> loading file None [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:51,265 >> loading file None [INFO|tokenization_utils_base.py:1776] 2023-01-25 02:16:51,265 >> loading file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/tokenizer_config.json [INFO|configuration_utils.py:652] 2023-01-25 02:16:51,266 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:690] 2023-01-25 02:16:51,267 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|tokenization_utils.py:425] 2023-01-25 02:16:51,317 >> Adding @ChemicalEntitySrc$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:51,318 >> Adding @ChemicalEntityTgt$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:51,318 >> Adding @DiseaseOrPhenotypicFeatureSrc$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:51,318 >> Adding @DiseaseOrPhenotypicFeatureTgt$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:51,318 >> Adding @GeneOrGeneProductSrc$ to the vocabulary [INFO|tokenization_utils.py:425] 2023-01-25 02:16:51,318 >> Adding @GeneOrGeneProductTgt$ to the vocabulary [WARNING|logging.py:279] 2023-01-25 02:16:51,319 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:652] 2023-01-25 02:16:51,320 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:690] 2023-01-25 02:16:51,321 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[WARNING|logging.py:279] 2023-01-25 02:16:51,354 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. =======================>label2id {'None': 0, 'No': 1, 'Novel': 2} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:652] 2023-01-25 02:16:51,357 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:690] 2023-01-25 02:16:51,358 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "No", "2": "Novel" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "No": 1, "None": 0, "Novel": 2 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|modeling_tf_utils.py:1776] 2023-01-25 02:16:51,391 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:119] 2023-01-25 02:16:51,609 >> Loading PyTorch weights from /mnt/c/Users/berke/Documents/boun/biored/biored_re/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
Traceback (most recent call last):
File "src/run_biored_exp.py", line 795, in
Hi @berkekavak ,
Thanks.
There is another problem I found.
Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
You can try the following commands if you are unable to access the GPU.
conda install -c conda-forge cudatoolkit=11.1
conda install -c conda-forge cudnn=8.2.1
However, the GPU error does not appear to be the cause of the below error.
Traceback (most recent call last): File "src/run_biored_exp.py", line 795, in <module> main() File "src/run_biored_exp.py", line 624, in main cache_dir = model_args.cache_dir, File "/home/berkekavak/miniconda3/envs/species/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained
Could you please share the Python packages you have installed? Thank you.
I tried to execute the code:
1) on a Windows (WSL Ubuntu 20.04 LTS) 2) on a Macbook Pro M1 3) on an Intel Mac (2017)
I got similar errors on those devices, with and without using a GPU. Which is about the labels. I guess the problem is related to the pretrained packages. Latest PubMedBERT and compatibility issues of this model might be the problem. I tried to Find the python versions (requirements.txt) of my environment attached.
Best, Berke.
On Wed, Jan 25, 2023 at 3:31 AM ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
Thanks.
There is another problem I found. Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory You can try the following commands if you are unable to access the GPU. conda install -c conda-forge cudatoolkit=11.1 conda install -c conda-forge cudnn=8.2.1
However, the GPU error does not appear to be the cause of the below error. Traceback (most recent call last): File "src/run_biored_exp.py", line 795, in
main() File "src/run_biored_exp.py", line 624, in main cache_dir = model_args.cache_dir, File "/home/berkekavak/miniconda3/envs/species/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained Could you please share the Python packages you have installed? Thank you. — Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1402879249, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNCZ62EKOB4M3MUW62ULWUBX73ANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
absl-py==0.15.0 accelerate==0.9.0 aiohttp==3.8.3 aiosignal==1.2.0 astunparse==1.6.3 async-timeout==4.0.2 asynctest==0.13.0 attrs==22.2.0 awscli==1.24.10 blis==0.7.9 botocore==1.26.10 cached-property==1.5.2 cachetools==4.2.4 catalogue==2.0.8 certifi==2021.5.30 charset-normalizer==2.0.12 clang==5.0 click==8.0.4 colorama==0.4.4 conllu==4.5.2 contextvars==2.4 cymem==2.0.7 dataclasses==0.8 datasets==2.3.2 dill==0.3.4 docutils==0.16 en-core-sci-md @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz filelock==3.4.1 flatbuffers==1.12 frozenlist==1.2.0 fsspec==2022.1.0 gast==0.4.0 google-auth==1.35.0 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 grpcio==1.48.2 h5py==3.1.0 huggingface-hub==0.4.0 idna==3.4 idna-ssl==1.1.0 immutables==0.19 importlib-metadata==4.8.3 importlib-resources==5.4.0 Jinja2==3.0.3 jmespath==0.10.0 joblib==1.1.1 keras==2.6.0 Keras-Preprocessing==1.1.2 langcodes==3.3.0 Markdown==3.3.7 MarkupSafe==2.0.1 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash==1.0.9 nmslib==2.1.1 numpy==1.19.5 oauthlib==3.2.2 opt-einsum==3.3.0 packaging==21.3 pandas==1.1.5 pathy==0.10.1 preshed==3.0.8 protobuf==3.19.4 psutil==5.9.4 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pydantic==1.8.2 pyparsing==3.0.9 pysbd==0.3.4 python-dateutil==2.8.2 pytz==2022.7 PyYAML==5.4.1 regex==2022.10.31 requests==2.27.1 requests-oauthlib==1.3.1 responses==0.17.0 rsa==4.7.2 s3transfer==0.5.2 sacremoses==0.0.53 scikit-learn==0.24.2 scipy==1.5.4 scispacy==0.2.4 sentencepiece==0.1.97 six==1.15.0 smart-open==6.3.0 spacy==3.2.4 spacy-legacy==3.0.11 spacy-loggers==1.0.4 srsly==2.4.5 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.6.2 tensorflow-estimator==2.6.0 tensorflow-gpu==2.6.2 termcolor==1.1.0 thinc==8.0.17 threadpoolctl==3.1.0 tokenizers==0.12.1 torch==1.8.0 tqdm==4.64.1 transformers==4.18.0 typer==0.4.2 typing-extensions==3.7.4.3 urllib3==1.26.14 wasabi==0.10.1 Werkzeug==2.0.3 wrapt==1.12.1 xxhash==3.2.0 yarl==1.7.2 zipp==3.6.0
Hi @berkekavak ,
Your packages and the latest version of https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/tree/main worked for me. Below is the environment and message I received. Packakges
Snippet
I am unable to reproduce your error message, however. The code is still compatible with the latest pre-trained model. Code is tested on CentOS Linux release 7.5.1804 (Core), and I didn't test it on your OS. I'm not sure if it's a problem.
BTW, can you run the code of https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification ? You should also be able to run our code if you can. Please let me know what version of Python and packages you use to run it. It should be possible for me to test and then update the biored code to support your versions. I have another version of biored code that supports Python 3.9, recent tensorflow and transformers.
Maybe there is an incompatibility with Python 3.6 and Transformers 4.18.0. I tried the example transformers code you mentioned:
(species) @.***:/mnt/c/Users/berke/Documents/transformers/examples/tensorflow/text-classification$
python run_text_classification.py
Traceback (most recent call last):
File "run_text_classification.py", line 41, in
I am using Python 3.6.13 to run the code with the mentioned requirements txt.
Could you please send your requirements.txt and python version so that I can try creating a conda env? Could you also please send the code that supports Python 3.9?
I am attaching a screenshot of my files in case you want to check the directories. Best, Berke.
On Wed, Jan 25, 2023 at 7:18 PM ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
Your packages and the latest version of https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/tree/main worked for me. Below is the environment and message I received. Packakges [image: 截圖 2023-01-25 下午11 34 49] https://user-images.githubusercontent.com/61985809/214606748-1f135b21-833b-4df6-8ad5-888202d3f8e5.png Snippet [image: 截圖 2023-01-25 下午11 39 20] https://user-images.githubusercontent.com/61985809/214607141-eefe6283-2cf6-49f0-8adb-5816505bc615.png
I am unable to reproduce your error message, however. The code is still compatible with the latest pre-trained model. Code is tested on CentOS Linux release 7.5.1804 (Core), and I didn't test it on your OS. I'm not sure if it's a problem.
BTW, can you run the code of https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification ? You should also be able to run our code if you can. Please let me know what version of Python and packages you use to run it. It should be possible for me to test and then update the biored code to support your versions. I have another version of biored code that supports Python 3.9, recent tensorflow and transformers.
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1403871101, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNC3Z22AFIJ3JRESWQWLWUFG5HANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
Hi @berkekavak ,
I tested the requirements.txt that you mentioned, but I used python 3.6.15.
I tried to execute the code: 1) on a Windows (WSL Ubuntu 20.04 LTS) 2) on a Macbook Pro M1 3) on an Intel Mac (2017) I got similar errors on those devices, with and without using a GPU. Which is about the labels. I guess the problem is related to the pretrained packages. Latest PubMedBERT and compatibility issues of this model might be the problem. I tried to Find the python versions (requirements.txt) of my environment attached. Best, Berke. On Wed, Jan 25, 2023 at 3:31 AM ptlai @.> wrote: Hi @berkekavak https://github.com/berkekavak , Thanks. There is another problem I found. Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory You can try the following commands if you are unable to access the GPU. conda install -c conda-forge cudatoolkit=11.1 conda install -c conda-forge cudnn=8.2.1 However, the GPU error does not appear to be the cause of the below error. Traceback (most recent call last): File "src/run_biored_exp.py", line 795, in
main() File "src/run_biored_exp.py", line 624, in main cache_dir = model_args.cache_dir, File "/home/berkekavak/miniconda3/envs/species/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained Could you please share the Python packages you have installed? Thank you. — Reply to this email directly, view it on GitHub <#1 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNCZ62EKOB4M3MUW62ULWUBX73ANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.> absl-py==0.15.0 accelerate==0.9.0 aiohttp==3.8.3 aiosignal==1.2.0 astunparse==1.6.3 async-timeout==4.0.2 asynctest==0.13.0 attrs==22.2.0 awscli==1.24.10 blis==0.7.9 botocore==1.26.10 cached-property==1.5.2 cachetools==4.2.4 catalogue==2.0.8 certifi==2021.5.30 charset-normalizer==2.0.12 clang==5.0 click==8.0.4 colorama==0.4.4 conllu==4.5.2 contextvars==2.4 cymem==2.0.7 dataclasses==0.8 datasets==2.3.2 dill==0.3.4 docutils==0.16 en-core-sci-md @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz filelock==3.4.1 flatbuffers==1.12 frozenlist==1.2.0 fsspec==2022.1.0 gast==0.4.0 google-auth==1.35.0 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 grpcio==1.48.2 h5py==3.1.0 huggingface-hub==0.4.0 idna==3.4 idna-ssl==1.1.0 immutables==0.19 importlib-metadata==4.8.3 importlib-resources==5.4.0 Jinja2==3.0.3 jmespath==0.10.0 joblib==1.1.1 keras==2.6.0 Keras-Preprocessing==1.1.2 langcodes==3.3.0 Markdown==3.3.7 MarkupSafe==2.0.1 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash==1.0.9 nmslib==2.1.1 numpy==1.19.5 oauthlib==3.2.2 opt-einsum==3.3.0 packaging==21.3 pandas==1.1.5 pathy==0.10.1 preshed==3.0.8 protobuf==3.19.4 psutil==5.9.4 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pydantic==1.8.2 pyparsing==3.0.9 pysbd==0.3.4 python-dateutil==2.8.2 pytz==2022.7 PyYAML==5.4.1 regex==2022.10.31 requests==2.27.1 requests-oauthlib==1.3.1 responses==0.17.0 rsa==4.7.2 s3transfer==0.5.2 sacremoses==0.0.53 scikit-learn==0.24.2 scipy==1.5.4 scispacy==0.2.4 sentencepiece==0.1.97 six==1.15.0 smart-open==6.3.0 spacy==3.2.4 spacy-legacy==3.0.11 spacy-loggers==1.0.4 srsly==2.4.5 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.6.2 tensorflow-estimator==2.6.0 tensorflow-gpu==2.6.2 termcolor==1.1.0 thinc==8.0.17 threadpoolctl==3.1.0 tokenizers==0.12.1 torch==1.8.0 tqdm==4.64.1 transformers==4.18.0 typer==0.4.2 typing-extensions==3.7.4.3 urllib3==1.26.14 wasabi==0.10.1 Werkzeug==2.0.3 wrapt==1.12.1 xxhash==3.2.0 yarl==1.7.2 zipp==3.6.0
However, as I mentioned earlier, it works on our server. The error may be caused by cuda or GPU driver rather than requirements.txt, but I am not sure. I would appreciate it if you could also try the hugginggface transformer's sample code, and let me know if it works. I am traveling now, and I will return next Monday. After that, I should be able to send you the python 3.9 version of biored_re.
The biored code currently available at the repo has some control characters issues and needs a slight modification. I copied the script (run_biored_exp.sh) into the biored_re directory (instead of biored_re/scripts) and then executed the code by:
bash run_biored_exp.sh 0 (for my Mac)
I also tested the example code that you sent. Working on this issue for almost 3 weeks. I wish we could schedule a short zoom session. Or I can try the new biored code for python 3.9. After that, I can contact you again.
I wish you a safe journey. Many thanks for your answers.
On Wed, Jan 25, 2023 at 8:35 PM ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
I tested the requirements.txt that you mentioned, but I used python 3.6.15.
I tried to execute the code: 1) on a Windows (WSL Ubuntu 20.04 LTS) 2) on a Macbook Pro M1 3) on an Intel Mac (2017) I got similar errors on those devices, with and without using a GPU. Which is about the labels. I guess the problem is related to the pretrained packages. Latest PubMedBERT and compatibility issues of this model might be the problem. I tried to Find the python versions (requirements.txt) of my environment attached. Best, Berke. On Wed, Jan 25, 2023 at 3:31 AM ptlai @.> wrote: Hi @berkekavak https://github.com/berkekavak https://github.com/berkekavak https://github.com/berkekavak , Thanks. There is another problem I found. Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory You can try the following commands if you are unable to access the GPU. conda install -c conda-forge cudatoolkit=11.1 conda install -c conda-forge cudnn=8.2.1 However, the GPU error does not appear to be the cause of the below error. Traceback (most recent call last): File "src/run_biored_exp.py", line 795, in main() File "src/run_biored_exp.py", line 624, in main cache_dir = model_args.cache_dir, File "/home/berkekavak/miniconda3/envs/species/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained Could you please share the Python packages you have installed? Thank you. — Reply to this email directly, view it on GitHub <#1 (comment) https://github.com/ncbi/BioRED/issues/1#issuecomment-1402879249>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNCZ62EKOB4M3MUW62ULWUBX73ANCNFSM6AAAAAAQMGNVRU https://github.com/notifications/unsubscribe-auth/ALTKNCZ62EKOB4M3MUW62ULWUBX73ANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.> absl-py==0.15.0 accelerate==0.9.0 aiohttp==3.8.3 aiosignal==1.2.0 astunparse==1.6.3 async-timeout==4.0.2 asynctest==0.13.0 attrs==22.2.0 awscli==1.24.10 blis==0.7.9 botocore==1.26.10 cached-property==1.5.2 cachetools==4.2.4 catalogue==2.0.8 certifi==2021.5.30 charset-normalizer==2.0.12 clang==5.0 click==8.0.4 colorama==0.4.4 conllu==4.5.2 contextvars==2.4 cymem==2.0.7 dataclasses==0.8 datasets==2.3.2 dill==0.3.4 docutils==0.16 en-core-sci-md @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz filelock==3.4.1 flatbuffers==1.12 frozenlist==1.2.0 fsspec==2022.1.0 gast==0.4.0 google-auth==1.35.0 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 grpcio==1.48.2 h5py==3.1.0 huggingface-hub==0.4.0 idna==3.4 idna-ssl==1.1.0 immutables==0.19 importlib-metadata==4.8.3 importlib-resources==5.4.0 Jinja2==3.0.3 jmespath==0.10.0 joblib==1.1.1 keras==2.6.0 Keras-Preprocessing==1.1.2 langcodes==3.3.0 Markdown==3.3.7 MarkupSafe==2.0.1 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash==1.0.9 nmslib==2.1.1 numpy==1.19.5 oauthlib==3.2.2 opt-einsum==3.3.0 packaging==21.3 pandas==1.1.5 pathy==0.10.1 preshed==3.0.8 protobuf==3.19.4 psutil==5.9.4 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pydantic==1.8.2 pyparsing==3.0.9 pysbd==0.3.4 python-dateutil==2.8.2 pytz==2022.7 PyYAML==5.4.1 regex==2022.10.31 requests==2.27.1 requests-oauthlib==1.3.1 responses==0.17.0 rsa==4.7.2 s3transfer==0.5.2 sacremoses==0.0.53 scikit-learn==0.24.2 scipy==1.5.4 scispacy==0.2.4 sentencepiece==0.1.97 six==1.15.0 smart-open==6.3.0 spacy==3.2.4 spacy-legacy==3.0.11 spacy-loggers==1.0.4 srsly==2.4.5 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.6.2 tensorflow-estimator==2.6.0 tensorflow-gpu==2.6.2 termcolor==1.1.0 thinc==8.0.17 threadpoolctl==3.1.0 tokenizers==0.12.1 torch==1.8.0 tqdm==4.64.1 transformers==4.18.0 typer==0.4.2 typing-extensions==3.7.4.3 urllib3==1.26.14 wasabi==0.10.1 Werkzeug==2.0.3 wrapt==1.12.1 xxhash==3.2.0 yarl==1.7.2 zipp==3.6.0
However, as I mentioned earlier, it works on our server. The error may be caused by cuda or GPU driver rather than requirements.txt, but I am not sure. I would appreciate it if you could also try the hugginggface transformer's sample code, and let me know if it works. I am traveling now, and I will return next Monday. After that, I should be able to send you the python 3.9 version of biored_re.
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1403988365, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNC6N3FOBYQTSNUIHVNLWUFP5RANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
(tensorflow) ➜ biored_re git:(master) ✗ bash run_biored_exp.sh 0
in shell script task name: biored_all_mul
[INFO|training_args.py:1094] 2023-01-26 17:01:28,717 >> using logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1230] 2023-01-26 17:01:28,717 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:190] 2023-01-26 17:01:28,718 >> Tensorflow: setting up strategy
Metal device set to: Apple M1 Pro
systemMemory: 16.00 GB maxCacheSize: 5.33 GB
2023-01-26 17:01:28.718880: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-01-26 17:01:28.718898: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id:
[INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:28,723 >> loading file vocab.txt [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:28,723 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:28,723 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:28,724 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:28,724 >> loading file tokenizer_config.json [INFO|configuration_utils.py:658] 2023-01-26 17:01:28,724 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:712] 2023-01-26 17:01:28,724 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.26.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|tokenization_utils.py:426] 2023-01-26 17:01:28,738 >> Adding @ChemicalEntitySrc$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:28,738 >> Adding @ChemicalEntityTgt$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:28,738 >> Adding @DiseaseOrPhenotypicFeatureSrc$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:28,738 >> Adding @DiseaseOrPhenotypicFeatureTgt$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:28,738 >> Adding @GeneOrGeneProductSrc$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:28,738 >> Adding @GeneOrGeneProductTgt$ to the vocabulary [WARNING|logging.py:281] 2023-01-26 17:01:28,738 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:658] 2023-01-26 17:01:28,739 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:712] 2023-01-26 17:01:28,739 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.26.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[WARNING|logging.py:281] 2023-01-26 17:01:28,748 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. =======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:658] 2023-01-26 17:01:28,750 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:712] 2023-01-26 17:01:28,751 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.26.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|modeling_tf_utils.py:2694] 2023-01-26 17:01:28,776 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:168] 2023-01-26 17:01:28,859 >> Loading PyTorch weights from /Users/berkekavak/boun/biored/biored_re/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
Traceback (most recent call last):
File "/Users/berkekavak/boun/biored/biored_re/src/run_biored_exp.py", line 795, in logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1230] 2023-01-26 17:01:31,849 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:190] 2023-01-26 17:01:31,850 >> Tensorflow: setting up strategy
Metal device set to: Apple M1 Pro
systemMemory: 16.00 GB maxCacheSize: 5.33 GB
2023-01-26 17:01:31.850860: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:306] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2023-01-26 17:01:31.850876: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:272] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id:
[INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:31,855 >> loading file vocab.txt [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:31,855 >> loading file tokenizer.json [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:31,855 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:31,855 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:1800] 2023-01-26 17:01:31,855 >> loading file tokenizer_config.json [INFO|configuration_utils.py:658] 2023-01-26 17:01:31,855 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:712] 2023-01-26 17:01:31,856 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.26.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|tokenization_utils.py:426] 2023-01-26 17:01:31,870 >> Adding @ChemicalEntitySrc$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:31,870 >> Adding @ChemicalEntityTgt$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:31,870 >> Adding @DiseaseOrPhenotypicFeatureSrc$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:31,870 >> Adding @DiseaseOrPhenotypicFeatureTgt$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:31,870 >> Adding @GeneOrGeneProductSrc$ to the vocabulary [INFO|tokenization_utils.py:426] 2023-01-26 17:01:31,870 >> Adding @GeneOrGeneProductTgt$ to the vocabulary [WARNING|logging.py:281] 2023-01-26 17:01:31,870 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:658] 2023-01-26 17:01:31,871 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:712] 2023-01-26 17:01:31,871 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "initializer_range": 0.02, "intermediate_size": 3072, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.26.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[WARNING|logging.py:281] 2023-01-26 17:01:31,878 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. =======================>label2id {'None': 0, 'No': 1, 'Novel': 2} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:658] 2023-01-26 17:01:31,880 >> loading configuration file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/config.json [INFO|configuration_utils.py:712] 2023-01-26 17:01:31,880 >> Model config BertConfig { "_name_or_path": "microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract", "architectures": [ "BertForMaskedLM" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "No", "2": "Novel" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "No": 1, "None": 0, "Novel": 2 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.26.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 30522 }
[INFO|modeling_tf_utils.py:2694] 2023-01-26 17:01:31,895 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:168] 2023-01-26 17:01:31,963 >> Loading PyTorch weights from /Users/berkekavak/boun/biored/biored_re/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
Traceback (most recent call last):
File "/Users/berkekavak/boun/biored/biored_re/src/run_biored_exp.py", line 795, in
Hi,
I am able to run example codes of Huggingface, hope you returned safely.
Could you please send me the python 3.9 version of biored?
Best, Berke.
On Thu, Jan 26, 2023 at 5:05 PM BERKE KAVAK @.***> wrote:
The biored code currently available at the repo has some control characters issues and needs a slight modification. I copied the script (run_biored_exp.sh) into the biored_re directory (instead of biored_re/scripts) and then executed the code by:
bash run_biored_exp.sh 0 (for my Mac)
I also tested the example code that you sent. Working on this issue for almost 3 weeks. I wish we could schedule a short zoom session. Or I can try the new biored code for python 3.9. After that, I can contact you again.
I wish you a safe journey. Many thanks for your answers.
On Wed, Jan 25, 2023 at 8:35 PM ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
I tested the requirements.txt that you mentioned, but I used python 3.6.15.
I tried to execute the code: 1) on a Windows (WSL Ubuntu 20.04 LTS) 2) on a Macbook Pro M1 3) on an Intel Mac (2017) I got similar errors on those devices, with and without using a GPU. Which is about the labels. I guess the problem is related to the pretrained packages. Latest PubMedBERT and compatibility issues of this model might be the problem. I tried to Find the python versions (requirements.txt) of my environment attached. Best, Berke. On Wed, Jan 25, 2023 at 3:31 AM ptlai @.> wrote: Hi @berkekavak https://github.com/berkekavak https://github.com/berkekavak https://github.com/berkekavak , Thanks. There is another problem I found. Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory You can try the following commands if you are unable to access the GPU. conda install -c conda-forge cudatoolkit=11.1 conda install -c conda-forge cudnn=8.2.1 However, the GPU error does not appear to be the cause of the below error. Traceback (most recent call last): File "src/run_biored_exp.py", line 795, in main() File "src/run_biored_exp.py", line 624, in main cache_dir = model_args.cache_dir, File "/home/berkekavak/miniconda3/envs/species/lib/python3.6/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained Could you please share the Python packages you have installed? Thank you. — Reply to this email directly, view it on GitHub <#1 (comment) https://github.com/ncbi/BioRED/issues/1#issuecomment-1402879249>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNCZ62EKOB4M3MUW62ULWUBX73ANCNFSM6AAAAAAQMGNVRU https://github.com/notifications/unsubscribe-auth/ALTKNCZ62EKOB4M3MUW62ULWUBX73ANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.> absl-py==0.15.0 accelerate==0.9.0 aiohttp==3.8.3 aiosignal==1.2.0 astunparse==1.6.3 async-timeout==4.0.2 asynctest==0.13.0 attrs==22.2.0 awscli==1.24.10 blis==0.7.9 botocore==1.26.10 cached-property==1.5.2 cachetools==4.2.4 catalogue==2.0.8 certifi==2021.5.30 charset-normalizer==2.0.12 clang==5.0 click==8.0.4 colorama==0.4.4 conllu==4.5.2 contextvars==2.4 cymem==2.0.7 dataclasses==0.8 datasets==2.3.2 dill==0.3.4 docutils==0.16 en-core-sci-md @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.0/en_core_sci_md-0.5.0.tar.gz filelock==3.4.1 flatbuffers==1.12 frozenlist==1.2.0 fsspec==2022.1.0 gast==0.4.0 google-auth==1.35.0 google-auth-oauthlib==0.4.6 google-pasta==0.2.0 grpcio==1.48.2 h5py==3.1.0 huggingface-hub==0.4.0 idna==3.4 idna-ssl==1.1.0 immutables==0.19 importlib-metadata==4.8.3 importlib-resources==5.4.0 Jinja2==3.0.3 jmespath==0.10.0 joblib==1.1.1 keras==2.6.0 Keras-Preprocessing==1.1.2 langcodes==3.3.0 Markdown==3.3.7 MarkupSafe==2.0.1 multidict==5.2.0 multiprocess==0.70.12.2 murmurhash==1.0.9 nmslib==2.1.1 numpy==1.19.5 oauthlib==3.2.2 opt-einsum==3.3.0 packaging==21.3 pandas==1.1.5 pathy==0.10.1 preshed==3.0.8 protobuf==3.19.4 psutil==5.9.4 pyarrow==6.0.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pybind11==2.6.1 pydantic==1.8.2 pyparsing==3.0.9 pysbd==0.3.4 python-dateutil==2.8.2 pytz==2022.7 PyYAML==5.4.1 regex==2022.10.31 requests==2.27.1 requests-oauthlib==1.3.1 responses==0.17.0 rsa==4.7.2 s3transfer==0.5.2 sacremoses==0.0.53 scikit-learn==0.24.2 scipy==1.5.4 scispacy==0.2.4 sentencepiece==0.1.97 six==1.15.0 smart-open==6.3.0 spacy==3.2.4 spacy-legacy==3.0.11 spacy-loggers==1.0.4 srsly==2.4.5 tensorboard==2.6.0 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.6.2 tensorflow-estimator==2.6.0 tensorflow-gpu==2.6.2 termcolor==1.1.0 thinc==8.0.17 threadpoolctl==3.1.0 tokenizers==0.12.1 torch==1.8.0 tqdm==4.64.1 transformers==4.18.0 typer==0.4.2 typing-extensions==3.7.4.3 urllib3==1.26.14 wasabi==0.10.1 Werkzeug==2.0.3 wrapt==1.12.1 xxhash==3.2.0 yarl==1.7.2 zipp==3.6.0
However, as I mentioned earlier, it works on our server. The error may be caused by cuda or GPU driver rather than requirements.txt, but I am not sure. I would appreciate it if you could also try the hugginggface transformer's sample code, and let me know if it works. I am traveling now, and I will return next Monday. After that, I should be able to send you the python 3.9 version of biored_re.
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1403988365, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNC6N3FOBYQTSNUIHVNLWUFP5RANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
Hi @berkekavak ,
Thank you and sorry for the late reply. Please find the attached file, and let me know if you still have the same problem. Thanks! biored_re_py39.zip
Best, Po-Ting
Hi Po-Ting,
I am very happy that you sent the new version. Thank you for your great help. Yesterday I also tried this with an Intel based Mac. Got the exact same error message as attached (TFAutoModelForSequenceClassification this error is the exact same as the previous version of biored). Pytorch weights somehow cannot be loaded.
I download the model by creating a microsoft directory and under this directory: git clone https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract
I am not sure where the problem is but can you please send me your whole directory (with the PubMedBERT model)? Maybe then it might work.
Sincerely, Berke.
On Tue, Jan 31, 2023 at 8:10 AM ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
Thank you and sorry for the late reply. Please find the attached file, and let me know if you still have the same problem. Thanks! biored_re_py39.zip https://github.com/ncbi/BioRED/files/10542976/biored_re_py39.zip
Best, Po-Ting
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1409768069, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNCYUHWIC3JBUTPR6A2DWVCNDNANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
INFO|modeling_tf_utils.py:1776] 2023-02-03 12:35:49,344 >> loading weights file microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
[INFO|modeling_tf_pytorch_utils.py:119] 2023-02-03 12:35:49,429 >> Loading PyTorch weights from /mnt/c/Users/berke/Documents/boun/biored/biored_re_py39/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract/pytorch_model.bin
Traceback (most recent call last):
File "/mnt/c/Users/berke/Documents/boun/biored/biored_re_py39/src/run_biored_exp.py", line 795, in
Hi @berkekavak ,
Thank you! Could you please send me your email address? I will send you the link. BTW, I recommend you try GCloud or a regular Linux server if you are able, as the code has not been tested on Mac.
Best, Po-Ting
@.***
Thanks, Berke.
On 3 Feb 2023 Fri at 17:22 ptlai @.***> wrote:
Hi @berkekavak https://github.com/berkekavak ,
Thank you! Could you please send me your email address? I will send you the link. BTW, I recommend you try GCloud or a regular Linux server if you are able, as the code has not been tested on Mac.
Best, Po-Ting
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/1#issuecomment-1415938457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALTKNC2PXYCWTET2KBJOUYDWVUIB7ANCNFSM6AAAAAAQMGNVRU . You are receiving this because you were mentioned.Message ID: @.***>
--
Berke KavakIndustrial Engineering & Economics
Hello @berkekavak , Your email address is not visible to me; could you please send it to laip2@nih.gov again? Thank you!
I recenlty updated our script to enable users to utilize our pre-trained model for predicting new data in pubtator format. Instructions can be found in the README file at https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/biored_re_source_code.tar.
Regarding a previous email, the question raised by @berkekavak has been modified to address NER in BioRED. Assuming no further questions, it is understood that Ling has resolved the issue regarding the use of AIONER in BioRED.
hi, Im trying to run bash scripts/run_test_pred.sh 0 -------- I am getting this error, I used transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.19.5 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 spacy == 3.2.4 scispacy == 0.2.4 tensorflow-gpu == 2.6.2
ERROR
Converting the dataset into BioRED-RE input format
Traceback (most recent call last):
File "/home/microcrispr9/Downloads/biored_re_source_code/src/dataset_format_converter/convert_pubtator_2_bert.py", line 14, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/microcrispr9/Downloads/biored_re_source_code/src/run_biored_exp.py", line 35, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/microcrispr9/Downloads/biored_re_source_code/src/run_biored_exp.py", line 35, in
KINDLY HELP RUN THESE FILES.
I have some problems when reproducing your paper. I am not sure whether it is the version of the installation package, so can you provide a new Requirements.txt document with the version number of the installation package.