Closed Khyati-Microcrispr closed 3 months ago
Hi @Khyati-Microcrispr ,
Thank you for your interest in BioRED.
Input-output files are in the same format as BioRED pubtator files available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/BC8_BioRED_Subtask1_PubTator.zip.
I have tried the configuration you mentioned in https://github.com/ncbi/BioRED/issues/1#issuecomment-2090074264. However, I am unable to reproduce the same error, and the code could not run in your configuration. I just tested this environment setting again and it worked fine. Windows 11 + WSL + Cuda version 11.2 + cuDNN version 8 requirements.txt:
transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3 https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
Please let me know if you still have any questions. Thanks.
I also tried biorex and I am getting this error :
bash scripts/run_test_pred.sh
Converting the dataset into BioREx input format
2024-05-04 16:58:56.713845: I tensorflow/core/util/port.cc:113] oneDNN
custom operations are on. You may see slightly different numerical results
due to floating-point round-off errors from different computation orders.
To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-04 16:58:56.783843: I
tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary
is optimized to use available CPU instructions in performance-critical
operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in
other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-04 16:58:57.995882: W
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could
not find TensorRT
number_unique_YES_instances 0
Generating RE predictions
2024-05-04 16:59:01.536550: I tensorflow/core/util/port.cc:113] oneDNN
custom operations are on. You may see slightly different numerical results
due to floating-point round-off errors from different computation orders.
To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-04 16:59:01.606442: I
tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary
is optimized to use available CPU instructions in performance-critical
operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in
other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-04 16:59:02.202204: W
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could
not find TensorRT
[INFO|training_args.py:804] 2024-05-04 16:59:04,468 >> using
logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2024-05-04 16:59:04,468 >> PyTorch: setting up
devices
[INFO|training_args.py:885] 2024-05-04 16:59:04,498 >> The default value
for the training argument --report_to
will change in v5 (from all
installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code
and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-05-04 16:59:04,499 >> Tensorflow:
setting up strategy
2024-05-04 16:59:04.650622: I
tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device
/job:localhost/replica:0/task:0/device:GPU:0 with 20070 MB memory: ->
device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:31:00.0, compute
capability: 8.9
05/04/2024 16:59:04 - INFO - main - n_replicas: 1, distributed
training: False, 16-bits training: False
05/04/2024 16:59:04 - INFO - main - Training/evaluation parameters
TFTrainingArguments(
_n_gpu=1,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=
[INFO|modeling_tf_utils.py:1776] 2024-05-04 16:59:04,731 >> loading weights file pretrained_model/tf_model.h5 2024-05-04 16:59:08.957082: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:510] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. Searched for CUDA in the following directories: ./cuda_sdk_lib /usr/local/cuda-12.3 /usr/local/cuda
/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc
/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc
.
You can choose the search directory by setting xla_gpu_cuda_data_dir in
HloModule's DebugOptions. For most apps, setting the environment variable
XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2024-05-04 16:59:09.273975: W
external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:548]
libdevice is required by this HLO module but was not found at
./libdevice.10.bc
error: libdevice not found at ./libdevice.10.bc
2024-05-04 16:59:09.274245: E
tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:207]
INTERNAL: Generating device code failed.
2024-05-04 16:59:09.275276: W tensorflow/core/framework/op_kernel.cc:1827]
UNKNOWN: JIT compilation failed.
2024-05-04 16:59:09.275300: W
tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is
aborting with status: UNKNOWN: JIT compilation failed.
Traceback (most recent call last):
File "/home/microcrispr9/Downloads/BioREx/src/run_ncbi_rel_exp.py", line
884, in
{{function_node wrappedRsqrtdevice/job:localhost/replica:0/task:0/device:GPU:0}} JIT compilation failed. [Op:Rsqrt] name:
Arguments received by LayerNormalization.call():
• inputs=tf.Tensor(shape=(3, 5, 768), dtype=float32)
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
2024-05-04 16:59:12.166891: I tensorflow/core/util/port.cc:113] oneDNN
custom operations are on. You may see slightly different numerical results
due to floating-point round-off errors from different computation orders.
To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0
.
2024-05-04 16:59:12.207675: I
tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary
is optimized to use available CPU instructions in performance-critical
operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in
other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-04 16:59:12.829608: W
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could
not find TensorRT
(biorex) @.***:~/Downloads/BioREx$
On Sat, 4 May 2024 at 16:28, Khyati Patni @.***> wrote:
Hi, Thank you for reverting back to me. I took the attached (email) pubtator file as input file, and followed the README file for predicting new data without the need for training. Still I am getting this error mentioned in an error.docx file (attached). GPU is available and tensorflow Version: 2.9.3. is detecting that. Please Help me resolve this issue. Thanks and regards, Khyati
On Sat, 4 May 2024 at 09:23, Po-Ting Lai @.***> wrote:
Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr ,
Thank you for your interest in BioRED.
Input-output files are in the same format as BioRED pubtator files available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/BC8_BioRED_Subtask1_PubTator.zip .
I have tried the configuration you mentioned in #1 (comment) https://github.com/ncbi/BioRED/issues/1#issuecomment-2090074264. However, I am unable to reproduce the same error, and the code could not run in your configuration. I just tested this environment setting again and it worked fine. Windows 11 + WSL + Cuda version 11.2 + cuDNN version 8 requirements.txt:
transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
Please let me know if you still have any questions. Thanks.
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2093994269, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYXZ55ZI6FCWD3H2NR3ZARLTVAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTHE4TIMRWHE . You are receiving this because you were mentioned.Message ID: @.***>
Hello @Khyati-Microcrispr,
There is an error "error: libdevice not found at ./libdevice.10.bc", and the program fails. This looks like an issue of Cuda set up. Can you confirm that you've installed cuda files and they can be accessed in python?
Hi, thank you for debugging. I am using the NVIDIA GeForce RTX 4090 GPU on Ubuntu 22.04, while your team used the NVIDIA Tesla V100 SXM2. Could you please tell me the versions of CUDA, cuDNN, drivers, TensorFlow, Python, or other requirements that could be compatible to run the code? I have tried almost all possible configurations, but it's not working. Thank you for the help.
On Sat, 4 May 2024 at 20:37, Po-Ting Lai @.***> wrote:
Hello @Khyati-Microcrispr https://github.com/Khyati-Microcrispr,
There is an error "error: libdevice not found at ./libdevice.10.bc", and the program fails. This looks like an issue of Cuda set up. Can you confirm that you've installed cuda files and they can be accessed in python?
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2094248335, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYU3WFM74WFJ47REPPLZAT2SHAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGI2DQMZTGU . You are receiving this because you were mentioned.Message ID: @.***>
Hi @Khyati-Microcrispr , The environment.txt that I am using on NVIDIA Tesla V100 SXM2 is attached. Please let me know if you need any further information. Thanks.
Hi again, I am now using Gcloud, Cuda and other setups are working fine. I am still getting " File not found" and "Logits error" by using exp and pred commands respectively . I have attached the error files for your reference, Thank you for the help.
On Wed, 8 May 2024 at 04:32, Po-Ting Lai @.***> wrote:
Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr , The environment.txt https://github.com/ncbi/BioRED/files/15242005/environment.txt that I am using on NVIDIA Tesla V100 SXM2 is attached. Please let me know if you need any further information. Thanks.
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2099449098, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYQOT3MLZ5CC6NPR6K3ZBFMRVAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGQ2DSMBZHA . You are receiving this because you were mentioned.Message ID: @.***>
(biored_re) @.***:~/workspace/novoai/ground0/biored$ bash scripts/run_biored_exp.sh 0
2024-05-13 12:12:44.660373: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-13 12:12:45.254949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|training_args.py:804] 2024-05-13 12:12:47,239 >> using logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2024-05-13 12:12:47,239 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-05-13 12:12:47,797 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-05-13 12:12:47,798 >> Tensorflow: setting up strategy
2024-05-13 12:12:48.491250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79078 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0001:00:00.0, compute capability: 8.0
2024-05-13 12:12:48.492825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 79078 MB memory: -> device: 1, name: NVIDIA A100 80GB PCIe, pci bus id: 0002:00:00.0, compute capability: 8.0
2024-05-13 12:12:48.494416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 79078 MB memory: -> device: 2, name: NVIDIA A100 80GB PCIe, pci bus id: 0003:00:00.0, compute capability: 8.0
2024-05-13 12:12:48.496281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 79078 MB memory: -> device: 3, name: NVIDIA A100 80GB PCIe, pci bus id: 0004:00:00.0, compute capability: 8.0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
05/13/2024 12:12:48 - INFO - tensorflow - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
05/13/2024 12:12:49 - INFO - main - n_replicas: 4, distributed training: True, 16-bits training: False
05/13/2024 12:12:49 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=4,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=
[INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/vocab.txt [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/tokenizer.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/added_tokens.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/special_tokens_map.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,494 >> loading file biored_all_mul_model/tokenizer_config.json [INFO|configuration_utils.py:652] 2024-05-13 12:12:49,494 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:12:49,494 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }
=======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:652] 2024-05-13 12:12:49,507 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:12:49,507 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }
[INFO|modeling_tf_utils.py:1776] 2024-05-13 12:12:49,525 >> loading weights file biored_all_mul_model/tf_model.h5 [WARNING|modeling_tf_utils.py:1843] 2024-05-13 12:12:54,655 >> Some layers from the model checkpoint at biored_all_mulmodel were not used when initializing TFBertForSequenceClassification: ['bert/encoder/layer.1/intermediate/dense/bias:0', 'bert/encoder/layer.10/output/LayerNorm/beta:0', 'bert/encoder/layer.1/intermediate/dense/kernel:0', 'bert/encoder/layer.9/attention/self/key/kernel:0', 'bert/encoder/layer.8/attention/self/query/kernel:0', 'bert/encoder/layer.5/intermediate/dense/kernel:0', 'bert/encoder/layer.8/intermediate/dense/bias:0', 'bert/encoder/layer.11/output/dense/kernel:0', 'bert/encoder/layer.10/attention/output/dense/bias:0', 'bert/encoder/layer.0/attention/output/dense/bias:0', 'bert/encoder/layer.1/output/dense/bias:0', 'bert/encoder/layer.2/attention/self/key/bias:0', 'bert/embeddings/LayerNorm/beta:0', 'bert/encoder/layer.10/intermediate/dense/kernel:0', 'bert/encoder/layer.9/output/dense/bias:0', 'bert/encoder/layer.8/attention/output/LayerNorm/beta:0', 'bert/pooler/dense/bias:0', 'bert/encoder/layer.10/attention/self/query/bias:0', 'bert/encoder/layer.10/output/dense/bias:0', 'bert/encoder/layer.2/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/key/bias:0', 'bert/encoder/layer.5/attention/self/key/kernel:0', 'bert/encoder/layer.11/attention/self/key/bias:0', 'bert/encoder/layer.2/output/dense/kernel:0', 'bert/encoder/layer.9/attention/self/query/bias:0', 'bert/encoder/layer.11/attention/self/query/kernel:0', 'bert/encoder/layer.11/intermediate/dense/bias:0', 'bert/encoder/layer.2/attention/self/query/kernel:0', 'bert/encoder/layer.1/output/dense/kernel:0', 'bert/encoder/layer.5/output/dense/bias:0', 'bert/encoder/layer.6/attention/self/value/kernel:0', 'bert/encoder/layer.8/output/dense/kernel:0', 'bert/encoder/layer.0/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/attention/self/key/bias:0', 'bert/encoder/layer.5/output/LayerNorm/beta:0', 'bert/encoder/layer.11/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/self/key/bias:0', 'bert/encoder/layer.2/attention/self/value/kernel:0', 'bert/encoder/layer.8/attention/self/key/bias:0', 'bert/encoder/layer.0/attention/self/query/kernel:0', 'bert/encoder/layer.4/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.5/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.3/output/LayerNorm/gamma:0', 'bert/encoder/layer.9/attention/self/query/kernel:0', 'bert/encoder/layer.3/output/dense/bias:0', 'bert/encoder/layer.1/output/LayerNorm/beta:0', 'bert/encoder/layer.2/attention/output/dense/bias:0', 'bert/encoder/layer.7/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.3/attention/self/key/bias:0', 'bert/encoder/layer.3/attention/self/value/bias:0', 'bert/encoder/layer.10/attention/self/value/kernel:0', 'bert/encoder/layer.2/attention/output/dense/kernel:0', 'bert/encoder/layer.9/intermediate/dense/bias:0', 'bert/encoder/layer.5/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/attention/self/value/bias:0', 'bert/encoder/layer.2/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer._3/intermediate/dense/kernel:0', 'bert/embeddings/wordembeddings/weight:0', 'bert/encoder/layer.2/attention/output/LayerNorm/beta:0', 'classifier/kernel:0', 'bert/encoder/layer.0/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/value/kernel:0', 'bert/encoder/layer.5/attention/self/value/kernel:0', 'bert/encoder/layer.6/output/LayerNorm/beta:0', 'bert/encoder/layer.1/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/attention/self/value/bias:0', 'bert/encoder/layer.4/output/dense/bias:0', 'bert/encoder/layer.7/attention/self/value/kernel:0', 'bert/encoder/layer.2/output/dense/bias:0', 'bert/encoder/layer.5/attention/output/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/bias:0', 'bert/encoder/layer.8/attention/self/value/kernel:0', 'bert/encoder/layer.10/attention/output/dense/kernel:0', 'bert/encoder/layer.7/attention/self/query/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/intermediate/dense/kernel:0', 'bert/encoder/layer.3/intermediate/dense/bias:0', 'bert/encoder/layer.0/output/dense/kernel:0', 'bert/encoder/layer.1/attention/self/key/bias:0', 'bert/encoder/layer.3/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/value/bias:0', 'bert/encoder/layer.11/attention/self/query/bias:0', 'bert/encoder/layer.3/attention/self/key/kernel:0', 'bert/encoder/layer.0/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/dense/kernel:0', 'bert/encoder/layer.10/attention/self/query/kernel:0', 'bert/encoder/layer.9/intermediate/dense/kernel:0', 'bert/encoder/layer.9/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/dense/kernel:0', 'bert/pooler/dense/kernel:0', 'bert/encoder/layer.0/intermediate/dense/bias:0', 'bert/encoder/layer._1/attention/self/query/kernel:0', 'bert/embeddings/positionembeddings/embeddings:0', 'bert/encoder/layer._0/attention/output/dense/kernel:0', 'bert/embeddings/token_typeembeddings/embeddings:0', 'bert/encoder/layer.6/output/dense/bias:0', 'bert/encoder/layer.7/attention/self/value/bias:0', 'bert/encoder/layer.9/attention/output/dense/kernel:0', 'bert/encoder/layer.11/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/intermediate/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/query/bias:0', 'bert/encoder/layer.3/attention/self/query/bias:0', 'bert/encoder/layer.11/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/LayerNorm/beta:0', 'bert/encoder/layer.2/attention/self/query/bias:0', 'bert/encoder/layer.8/output/dense/bias:0', 'bert/encoder/layer.11/attention/self/key/kernel:0', 'bert/encoder/layer.0/attention/self/key/kernel:0', 'bert/encoder/layer.3/attention/self/value/kernel:0', 'bert/encoder/layer.8/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/attention/self/value/bias:0', 'bert/encoder/layer.9/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/attention/self/query/kernel:0', 'bert/encoder/layer.4/attention/self/key/kernel:0', 'bert/encoder/layer.7/attention/output/dense/kernel:0', 'bert/encoder/layer.8/attention/output/dense/bias:0', 'bert/encoder/layer.10/attention/self/key/kernel:0', 'bert/encoder/layer.6/attention/output/dense/kernel:0', 'bert/encoder/layer.6/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/output/dense/bias:0', 'bert/encoder/layer.7/attention/self/query/bias:0', 'bert/encoder/layer.7/attention/self/key/bias:0', 'bert/encoder/layer.10/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.5/attention/self/key/bias:0', 'bert/encoder/layer.3/attention/output/dense/bias:0', 'bert/encoder/layer.1/attention/self/key/kernel:0', 'bert/encoder/layer.5/attention/self/value/bias:0', 'bert/encoder/layer.9/output/LayerNorm/beta:0', 'bert/encoder/layer.1/attention/self/value/kernel:0', 'bert/encoder/layer.9/attention/self/value/bias:0', 'bert/encoder/layer.7/output/dense/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.3/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/query/kernel:0', 'bert/encoder/layer.10/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.7/output/dense/bias:0', 'bert/encoder/layer.10/output/dense/kernel:0', 'classifier/bias:0', 'bert/encoder/layer.2/intermediate/dense/bias:0', 'bert/encoder/layer.7/output/LayerNorm/beta:0', 'bert/encoder/layer.8/attention/self/key/kernel:0', 'bert/encoder/layer.7/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.9/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/intermediate/dense/bias:0', 'bert/encoder/layer.1/attention/self/query/bias:0', 'bert/encoder/layer.6/attention/self/key/kernel:0', 'bert/encoder/layer.5/output/dense/kernel:0', 'bert/encoder/layer.4/attention/output/dense/kernel:0', 'bert/encoder/layer.6/attention/self/value/bias:0', 'bert/encoder/layer.8/attention/self/value/bias:0', 'bert/encoder/layer.1/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.11/output/dense/bias:0', 'bert/encoder/layer.4/attention/output/dense/bias:0', 'bert/encoder/layer.0/attention/self/value/bias:0', 'bert/encoder/layer.11/attention/output/dense/bias:0', 'bert/encoder/layer.11/attention/self/value/bias:0', 'bert/encoder/layer.0/attention/self/key/bias:0', 'bert/encoder/layer.3/output/dense/kernel:0', 'bert/encoder/layer.4/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.0/attention/self/query/bias:0', 'bert/encoder/layer.9/output/dense/kernel:0', 'bert/encoder/layer.6/attention/self/query/bias:0', 'bert/encoder/layer.11/attention/output/dense/kernel:0', 'bert/encoder/layer.9/attention/self/value/kernel:0', 'bert/encoder/layer.10/attention/self/key/bias:0', 'bert/embeddings/LayerNorm/gamma:0', 'bert/encoder/layer.8/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/intermediate/dense/bias:0', 'bert/encoder/layer.0/attention/self/value/kernel:0', 'bert/encoder/layer.4/attention/self/query/bias:0', 'bert/encoder/layer.0/output/dense/bias:0', 'bert/encoder/layer.0/intermediate/dense/kernel:0', 'bert/encoder/layer.7/attention/self/key/kernel:0', 'bert/encoder/layer.8/attention/output/dense/kernel:0', 'bert/encoder/layer.6/intermediate/dense/bias:0', 'bert/encoder/layer.7/attention/output/dense/bias:0', 'bert/encoder/layer.3/attention/self/query/kernel:0', 'bert/encoder/layer.7/intermediate/dense/bias:0', 'bert/encoder/layer.5/attention/output/dense/bias:0', 'bert/encoder/layer.10/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/attention/self/query/bias:0', 'bert/encoder/layer.11/intermediate/dense/kernel:0', 'bert/encoder/layer.4/attention/self/query/kernel:0', 'bert/encoder/layer.8/intermediate/dense/kernel:0', 'bert/encoder/layer.2/attention/self/key/kernel:0', 'bert/encoder/layer.0/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/intermediate/dense/kernel:0', 'bert/encoder/layer.5/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.7/intermediate/dense/kernel:0', 'bert/encoder/layer.6/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/self/value/kernel:0', 'bert/encoder/layer.7/output/LayerNorm/gamma:0', 'bert/encoder/layer.9/attention/output/dense/bias:0', 'bert/encoder/layer.8/output/LayerNorm/beta:0', 'bert/encoder/layer._5/intermediate/dense/bias:0']
TFTrainer
is deprecated and will be removed in version 5 of Transformers. We recommend using native Keras instead, by calling methods like fit()
and predict()
directly on the model object. Detailed examples of the Keras style can be found in our examples at https://github.com/huggingface/transformers/tree/main/examples/tensorflow
warnings.warn(
[INFO|trainer_tf.py:124] 2024-05-13 12:13:04,719 >> You are instantiating a Trainer but W&B is not installed. To use wandb logging, run pip install wandb && wandb login
see https://docs.wandb.com/huggingface.
[INFO|trainer_tf.py:132] 2024-05-13 12:13:04,719 >> To use comet_ml logging, run pip/conda install comet_ml
see https://www.comet.ml/docs/python-sdk/huggingface/
2024-05-13 12:13:05.630757: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality
transformation is currently not handled by the auto-shard rewrite and will be removed.
05/13/2024 12:13:05 - INFO - tf_wrapper - Running training
05/13/2024 12:13:05 - INFO - tf_wrapper - Num examples = 22896
05/13/2024 12:13:05 - INFO - tf_wrapper - Num Epochs = 10
05/13/2024 12:13:05 - INFO - tf_wrapper - Instantaneous batch size per device = 16
05/13/2024 12:13:05 - INFO - tf_wrapper - Total train batch size (w. parallel, distributed & accumulation) = 64
05/13/2024 12:13:05 - INFO - tf_wrapper - Gradient Accumulation steps = 1
05/13/2024 12:13:05 - INFO - tf_wrapper - Steps per epoch = 358
05/13/2024 12:13:05 - INFO - tf_wrapper - Total optimization steps = 3580
2024-05-13 12:13:15.981625: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:7: Filling up shuffle buffer (this may take a while): 14657 of 22896
2024-05-13 12:13:21.430681: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:480] Shuffle buffer filled.
[INFO|trainer_tf.py:411] 2024-05-13 12:15:54,153 >> {'loss': 0.29368764, 'learning_rate': 9.972066e-06, 'epoch': 0.027932960893854747, 'step': 10}
[INFO|trainer_tf.py:411] 2024-05-13 12:15:56,527 >> {'loss': 0.23094805, 'learning_rate': 9.944134e-06, 'epoch': 0.055865921787709494, 'step': 20}
[INFO|trainer_tf.py:411] 2024-05-13 12:15:58,919 >> {'loss': 0.2152964, 'learning_rate': 9.916201e-06, 'epoch': 0.08379888268156424, 'step': 30}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:01,325 >> {'loss': 0.2096351, 'learning_rate': 9.888267e-06, 'epoch': 0.11173184357541899, 'step': 40}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:03,721 >> {'loss': 0.20234583, 'learning_rate': 9.860335e-06, 'epoch': 0.13966480446927373, 'step': 50}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:06,123 >> {'loss': 0.20314902, 'learning_rate': 9.832403e-06, 'epoch': 0.16759776536312848, 'step': 60}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:08,561 >> {'loss': 0.19634084, 'learning_rate': 9.804469e-06, 'epoch': 0.19553072625698323, 'step': 70}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:10,986 >> {'loss': 0.19569454, 'learning_rate': 9.776536e-06, 'epoch': 0.22346368715083798, 'step': 80}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:13,378 >> {'loss': 0.1949387, 'learning_rate': 9.748603e-06, 'epoch': 0.25139664804469275, 'step': 90}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:15,798 >> {'loss': 0.19381893, 'learning_rate': 9.72067e-06, 'epoch': 0.27932960893854747, 'step': 100}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:18,209 >> {'loss': 0.19055915, 'learning_rate': 9.692737e-06, 'epoch': 0.30726256983240224, 'step': 110}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:20,608 >> {'loss': 0.18848367, 'learning_rate': 9.664804e-06, 'epoch': 0.33519553072625696, 'step': 120}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:22,998 >> {'loss': 0.18808162, 'learning_rate': 9.636871e-06, 'epoch': 0.36312849162011174, 'step': 130}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:25,412 >> {'loss': 0.18767141, 'learning_rate': 9.608939e-06, 'epoch': 0.39106145251396646, 'step': 140}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:27,845 >> {'loss': 0.18443914, 'learning_rate': 9.5810055e-06, 'epoch': 0.41899441340782123, 'step': 150}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:30,257 >> {'loss': 0.18058124, 'learning_rate': 9.553072e-06, 'epoch': 0.44692737430167595, 'step': 160}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:32,657 >> {'loss': 0.17980446, 'learning_rate': 9.52514e-06, 'epoch': 0.4748603351955307, 'step': 170}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:35,046 >> {'loss': 0.17969151, 'learning_rate': 9.4972065e-06, 'epoch': 0.5027932960893855, 'step': 180}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:37,444 >> {'loss': 0.17940535, 'learning_rate': 9.469273e-06, 'epoch': 0.5307262569832403, 'step': 190}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:39,871 >> {'loss': 0.17969364, 'learning_rate': 9.44134e-06, 'epoch': 0.5586592178770949, 'step': 200}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:42,285 >> {'loss': 0.17845039, 'learning_rate': 9.4134075e-06, 'epoch': 0.5865921787709497, 'step': 210}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:44,709 >> {'loss': 0.18089801, 'learning_rate': 9.385475e-06, 'epoch': 0.6145251396648045, 'step': 220}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:47,108 >> {'loss': 0.18049736, 'learning_rate': 9.357542e-06, 'epoch': 0.6424581005586593, 'step': 230}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:49,511 >> {'loss': 0.18020788, 'learning_rate': 9.3296085e-06, 'epoch': 0.6703910614525139, 'step': 240}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:51,916 >> {'loss': 0.18143189, 'learning_rate': 9.301676e-06, 'epoch': 0.6983240223463687, 'step': 250}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:54,317 >> {'loss': 0.18115883, 'learning_rate': 9.273743e-06, 'epoch': 0.7262569832402235, 'step': 260}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:56,727 >> {'loss': 0.18003432, 'learning_rate': 9.245809e-06, 'epoch': 0.7541899441340782, 'step': 270}
[INFO|trainer_tf.py:411] 2024-05-13 12:16:59,130 >> {'loss': 0.17975433, 'learning_rate': 9.217877e-06, 'epoch': 0.7821229050279329, 'step': 280}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:01,550 >> {'loss': 0.17938477, 'learning_rate': 9.189944e-06, 'epoch': 0.8100558659217877, 'step': 290}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:03,957 >> {'loss': 0.17852996, 'learning_rate': 9.162011e-06, 'epoch': 0.8379888268156425, 'step': 300}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:06,377 >> {'loss': 0.17756976, 'learning_rate': 9.134078e-06, 'epoch': 0.8659217877094972, 'step': 310}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:08,795 >> {'loss': 0.1765033, 'learning_rate': 9.106146e-06, 'epoch': 0.8938547486033519, 'step': 320}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:11,203 >> {'loss': 0.17563178, 'learning_rate': 9.078212e-06, 'epoch': 0.9217877094972067, 'step': 330}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:13,612 >> {'loss': 0.17481384, 'learning_rate': 9.050279e-06, 'epoch': 0.9497206703910615, 'step': 340}
[INFO|trainer_tf.py:411] 2024-05-13 12:17:16,013 >> {'loss': 0.17335112, 'learning_rate': 9.022346e-06, 'epoch': 0.9776536312849162, 'step': 350}
2024-05-13 12:17:18.027665: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality
transformation is currently not handled by the auto-shard rewrite and will be removed.
[INFO|trainer_tf.py:313] 2024-05-13 12:17:18,034 >> Running Evaluation
[INFO|trainer_tf.py:314] 2024-05-13 12:17:18,034 >> Num examples in dataset = 6659
[INFO|trainer_tf.py:316] 2024-05-13 12:17:18,034 >> Num examples in used in evaluation = 6784
[INFO|trainer_tf.py:317] 2024-05-13 12:17:18,034 >> Batch size = 128
Traceback (most recent call last):
File "/home/isharma/workspace/novoai/ground0/biored/src/run_biored_exp.py", line 795, in logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2024-05-13 12:17:32,130 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-05-13 12:17:32,571 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-05-13 12:17:32,572 >> Tensorflow: setting up strategy
2024-05-13 12:17:33.220672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79078 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0001:00:00.0, compute capability: 8.0
2024-05-13 12:17:33.222276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 79078 MB memory: -> device: 1, name: NVIDIA A100 80GB PCIe, pci bus id: 0002:00:00.0, compute capability: 8.0
2024-05-13 12:17:33.223777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 79078 MB memory: -> device: 2, name: NVIDIA A100 80GB PCIe, pci bus id: 0003:00:00.0, compute capability: 8.0
2024-05-13 12:17:33.225374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 79078 MB memory: -> device: 3, name: NVIDIA A100 80GB PCIe, pci bus id: 0004:00:00.0, compute capability: 8.0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
05/13/2024 12:17:33 - INFO - tensorflow - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
05/13/2024 12:17:34 - INFO - main - n_replicas: 4, distributed training: True, 16-bits training: False
05/13/2024 12:17:34 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=4,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=True,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=[INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/vocab.txt [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/tokenizer.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/added_tokens.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/special_tokens_map.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/tokenizer_config.json [INFO|configuration_utils.py:652] 2024-05-13 12:17:34,232 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:17:34,233 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }
=======================>label2id {'None': 0, 'No': 1, 'Novel': 2} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:652] 2024-05-13 12:17:34,244 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:17:34,245 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "No", "2": "Novel" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "No": 1, "None": 0, "Novel": 2 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }
[INFO|modeling_tf_utils.py:1776] 2024-05-13 12:17:34,263 >> loading weights file biored_all_mul_model/tf_model.h5 [WARNING|modeling_tf_utils.py:1843] 2024-05-13 12:17:39,414 >> Some layers from the model checkpoint at biored_all_mulmodel were not used when initializing TFBertForSequenceClassification: ['bert/encoder/layer.3/attention/self/value/kernel:0', 'bert/encoder/layer.2/output/dense/kernel:0', 'bert/encoder/layer.7/intermediate/dense/bias:0', 'bert/encoder/layer.2/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/dense/kernel:0', 'bert/encoder/layer.1/output/dense/kernel:0', 'bert/encoder/layer.3/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/self/query/bias:0', 'bert/encoder/layer.9/output/dense/bias:0', 'bert/encoder/layer.1/attention/self/key/bias:0', 'bert/encoder/layer.5/attention/output/dense/kernel:0', 'bert/encoder/layer.6/attention/self/key/bias:0', 'bert/encoder/layer.10/output/dense/kernel:0', 'bert/encoder/layer.1/attention/self/value/bias:0', 'bert/encoder/layer.8/attention/self/query/kernel:0', 'bert/encoder/layer.5/attention/self/value/kernel:0', 'bert/encoder/layer.8/attention/self/query/bias:0', 'bert/encoder/layer.5/output/dense/kernel:0', 'bert/encoder/layer.0/attention/output/dense/bias:0', 'bert/encoder/layer.1/intermediate/dense/kernel:0', 'bert/encoder/layer.6/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/self/value/kernel:0', 'bert/encoder/layer.9/attention/output/dense/bias:0', 'bert/encoder/layer.6/intermediate/dense/bias:0', 'bert/encoder/layer.11/attention/self/key/kernel:0', 'bert/encoder/layer.7/output/dense/kernel:0', 'bert/encoder/layer.6/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/output/dense/bias:0', 'bert/encoder/layer.4/intermediate/dense/bias:0', 'bert/encoder/layer.8/intermediate/dense/bias:0', 'bert/encoder/layer.2/attention/self/query/kernel:0', 'bert/encoder/layer.2/output/LayerNorm/beta:0', 'bert/encoder/layer.5/attention/self/value/bias:0', 'bert/encoder/layer.2/intermediate/dense/kernel:0', 'bert/encoder/layer.10/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/output/dense/bias:0', 'bert/encoder/layer.8/attention/output/dense/bias:0', 'bert/encoder/layer.2/attention/output/dense/bias:0', 'bert/encoder/layer.7/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/attention/output/LayerNorm/beta:0', 'bert/encoder/layer._10/output/LayerNorm/gamma:0', 'bert/embeddings/token_typeembeddings/embeddings:0', 'bert/encoder/layer.1/attention/self/query/bias:0', 'bert/encoder/layer.7/attention/self/key/kernel:0', 'bert/encoder/layer.11/attention/self/key/bias:0', 'bert/encoder/layer.4/attention/self/key/kernel:0', 'bert/encoder/layer.8/attention/self/value/kernel:0', 'bert/encoder/layer._7/attention/output/LayerNorm/gamma:0', 'bert/embeddings/positionembeddings/embeddings:0', 'bert/encoder/layer.0/output/dense/bias:0', 'bert/encoder/layer.4/output/LayerNorm/beta:0', 'bert/encoder/layer.5/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/attention/self/key/kernel:0', 'bert/encoder/layer.10/attention/output/dense/kernel:0', 'bert/encoder/layer.1/output/LayerNorm/beta:0', 'bert/encoder/layer.11/intermediate/dense/bias:0', 'bert/encoder/layer.10/attention/self/value/kernel:0', 'bert/encoder/layer.4/attention/self/query/bias:0', 'bert/encoder/layer.4/output/dense/bias:0', 'bert/encoder/layer.7/attention/output/dense/kernel:0', 'bert/encoder/layer.7/attention/self/query/kernel:0', 'bert/encoder/layer.5/attention/self/key/kernel:0', 'bert/encoder/layer.3/intermediate/dense/bias:0', 'bert/encoder/layer.9/attention/self/value/bias:0', 'bert/encoder/layer.6/attention/self/value/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/self/value/bias:0', 'bert/encoder/layer.3/attention/self/value/bias:0', 'bert/encoder/layer.2/attention/output/dense/kernel:0', 'bert/encoder/layer.2/attention/self/value/bias:0', 'bert/encoder/layer.7/intermediate/dense/kernel:0', 'bert/encoder/layer.5/attention/output/dense/bias:0', 'bert/encoder/layer.5/intermediate/dense/bias:0', 'bert/encoder/layer.7/attention/self/value/kernel:0', 'bert/encoder/layer.5/intermediate/dense/kernel:0', 'bert/encoder/layer.2/intermediate/dense/bias:0', 'bert/encoder/layer.4/output/dense/kernel:0', 'bert/encoder/layer.9/attention/self/key/bias:0', 'bert/encoder/layer.10/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/output/dense/bias:0', 'bert/encoder/layer.6/attention/self/query/kernel:0', 'bert/encoder/layer.8/output/dense/kernel:0', 'bert/encoder/layer.11/attention/self/value/kernel:0', 'bert/encoder/layer.5/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/query/bias:0', 'bert/encoder/layer.11/attention/self/query/bias:0', 'bert/encoder/layer.7/output/dense/bias:0', 'bert/encoder/layer.8/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/self/value/bias:0', 'bert/encoder/layer.10/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.10/attention/self/query/kernel:0', 'bert/encoder/layer.6/intermediate/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/bias:0', 'bert/encoder/layer.8/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.5/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/output/dense/kernel:0', 'bert/encoder/layer.0/output/dense/kernel:0', 'bert/encoder/layer.11/attention/self/query/kernel:0', 'bert/pooler/dense/bias:0', 'bert/encoder/layer.3/attention/output/dense/bias:0', 'bert/encoder/layer.10/attention/self/query/bias:0', 'bert/encoder/layer.8/attention/self/key/kernel:0', 'bert/encoder/layer.3/attention/output/dense/kernel:0', 'bert/encoder/layer.1/output/dense/bias:0', 'bert/encoder/layer.10/attention/self/key/bias:0', 'bert/encoder/layer.8/intermediate/dense/kernel:0', 'bert/encoder/layer.11/output/dense/bias:0', 'bert/encoder/layer.11/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/attention/self/value/bias:0', 'bert/encoder/layer.0/intermediate/dense/bias:0', 'bert/encoder/layer.1/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/self/key/kernel:0', 'bert/encoder/layer.5/attention/self/query/kernel:0', 'bert/encoder/layer.11/attention/output/dense/bias:0', 'classifier/bias:0', 'bert/encoder/layer.9/intermediate/dense/bias:0', 'bert/encoder/layer.4/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/key/bias:0', 'bert/encoder/layer.9/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.0/attention/output/dense/kernel:0', 'bert/encoder/layer.9/output/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/kernel:0', 'bert/encoder/layer.10/intermediate/dense/kernel:0', 'bert/encoder/layer.4/attention/self/query/kernel:0', 'bert/encoder/layer.0/attention/self/query/kernel:0', 'bert/encoder/layer.0/attention/self/key/bias:0', 'bert/encoder/layer.10/attention/self/key/kernel:0', 'bert/encoder/layer.2/attention/self/key/bias:0', 'bert/encoder/layer.7/attention/output/LayerNorm/beta:0', 'classifier/kernel:0', 'bert/encoder/layer.2/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.3/attention/self/query/bias:0', 'bert/encoder/layer.1/attention/self/value/kernel:0', 'bert/encoder/layer.9/attention/self/query/bias:0', 'bert/encoder/layer.4/intermediate/dense/kernel:0', 'bert/encoder/layer.4/attention/self/key/bias:0', 'bert/encoder/layer.3/output/LayerNorm/beta:0', 'bert/embeddings/LayerNorm/gamma:0', 'bert/encoder/layer.10/intermediate/dense/bias:0', 'bert/encoder/layer.3/intermediate/dense/kernel:0', 'bert/encoder/layer.9/intermediate/dense/kernel:0', 'bert/encoder/layer.6/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/dense/bias:0', 'bert/encoder/layer.1/intermediate/dense/bias:0', 'bert/encoder/layer.4/attention/self/value/bias:0', 'bert/encoder/layer.4/attention/output/dense/bias:0', 'bert/encoder/layer.1/attention/self/query/kernel:0', 'bert/encoder/layer.9/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.10/attention/output/dense/bias:0', 'bert/encoder/layer.0/intermediate/dense/kernel:0', 'bert/encoder/layer.3/attention/self/key/kernel:0', 'bert/encoder/layer.2/attention/self/value/kernel:0', 'bert/encoder/layer.6/attention/self/key/kernel:0', 'bert/encoder/layer.2/attention/self/key/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.11/attention/output/dense/kernel:0', 'bert/encoder/layer.11/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.7/output/LayerNorm/beta:0', 'bert/encoder/layer.1/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.5/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.3/attention/self/query/kernel:0', 'bert/encoder/layer.3/output/dense/bias:0', 'bert/encoder/layer.11/output/dense/kernel:0', 'bert/encoder/layer.3/attention/self/key/bias:0', 'bert/encoder/layer.8/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/self/query/kernel:0', 'bert/encoder/layer.2/output/dense/bias:0', 'bert/embeddings/LayerNorm/beta:0', 'bert/pooler/dense/kernel:0', 'bert/encoder/layer.5/output/dense/bias:0', 'bert/encoder/layer.9/attention/self/key/kernel:0', 'bert/encoder/layer.0/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/attention/self/key/bias:0', 'bert/encoder/layer.8/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.9/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/self/query/bias:0', 'bert/encoder/layer.7/attention/output/dense/bias:0', 'bert/encoder/layer.11/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/value/kernel:0', 'bert/encoder/layer.7/attention/self/value/bias:0', 'bert/encoder/layer._7/attention/self/key/bias:0', 'bert/embeddings/wordembeddings/weight:0', 'bert/encoder/layer.0/attention/self/value/bias:0', 'bert/encoder/layer.8/attention/self/value/bias:0', 'bert/encoder/layer.9/output/LayerNorm/beta:0', 'bert/encoder/layer.2/attention/self/query/bias:0', 'bert/encoder/layer.0/attention/self/value/kernel:0', 'bert/encoder/layer.6/attention/output/dense/kernel:0', 'bert/encoder/layer.3/output/dense/kernel:0', 'bert/encoder/layer.11/intermediate/dense/kernel:0', 'bert/encoder/layer._7/attention/self/query/bias:0']
TFTrainer
is deprecated and will be removed in version 5 of Transformers. We recommend using native Keras instead, by calling methods like fit()
and predict()
directly on the model object. Detailed examples of the Keras style can be found in our examples at https://github.com/huggingface/transformers/tree/main/examples/tensorflow
warnings.warn(
[INFO|trainer_tf.py:124] 2024-05-13 12:17:41,673 >> You are instantiating a Trainer but W&B is not installed. To use wandb logging, run pip install wandb && wandb login
see https://docs.wandb.com/huggingface.
[INFO|trainer_tf.py:132] 2024-05-13 12:17:41,673 >> To use comet_ml logging, run pip/conda install comet_ml
see https://www.comet.ml/docs/python-sdk/huggingface/
2024-05-13 12:17:42.515636: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality
transformation is currently not handled by the auto-shard rewrite and will be removed.
05/13/2024 12:17:42 - INFO - tf_wrapper - Running training
05/13/2024 12:17:42 - INFO - tf_wrapper - Num examples = 4175
05/13/2024 12:17:42 - INFO - tf_wrapper - Num Epochs = 10
05/13/2024 12:17:42 - INFO - tf_wrapper - Instantaneous batch size per device = 16
05/13/2024 12:17:42 - INFO - tf_wrapper - Total train batch size (w. parallel, distributed & accumulation) = 64
05/13/2024 12:17:42 - INFO - tf_wrapper - Gradient Accumulation steps = 1
05/13/2024 12:17:42 - INFO - tf_wrapper - Steps per epoch = 66
05/13/2024 12:17:42 - INFO - tf_wrapper - Total optimization steps = 660
[INFO|trainer_tf.py:411] 2024-05-13 12:20:19,894 >> {'loss': 0.18754996, 'learning_rate': 9.848484e-06, 'epoch': 0.15151515151515152, 'step': 10}
[INFO|trainer_tf.py:411] 2024-05-13 12:20:22,274 >> {'loss': 0.17843515, 'learning_rate': 9.69697e-06, 'epoch': 0.30303030303030304, 'step': 20}
[INFO|trainer_tf.py:411] 2024-05-13 12:20:24,692 >> {'loss': 0.17367719, 'learning_rate': 9.545454e-06, 'epoch': 0.45454545454545453, 'step': 30}
[INFO|trainer_tf.py:411] 2024-05-13 12:20:27,083 >> {'loss': 0.16819657, 'learning_rate': 9.393939e-06, 'epoch': 0.6060606060606061, 'step': 40}
[INFO|trainer_tf.py:411] 2024-05-13 12:20:29,482 >> {'loss': 0.16544081, 'learning_rate': 9.242424e-06, 'epoch': 0.7575757575757576, 'step': 50}
[INFO|trainer_tf.py:411] 2024-05-13 12:20:31,891 >> {'loss': 0.16649602, 'learning_rate': 9.090909e-06, 'epoch': 0.9090909090909091, 'step': 60}
2024-05-13 12:20:33.373627: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality
transformation is currently not handled by the auto-shard rewrite and will be removed.
[INFO|trainer_tf.py:313] 2024-05-13 12:20:33,380 >> Running Evaluation
[INFO|trainer_tf.py:314] 2024-05-13 12:20:33,381 >> Num examples in dataset = 1161
[INFO|trainer_tf.py:316] 2024-05-13 12:20:33,381 >> Num examples in used in evaluation = 1280
[INFO|trainer_tf.py:317] 2024-05-13 12:20:33,381 >> Batch size = 128
Traceback (most recent call last):
File "/home/workspace/novoai/ground0/biored/src/run_biored_exp.py", line 795, in bash scripts/run_test_pred.sh 0
Converting the dataset into BioRED-RE input format
2024-05-13 12:34:16.485165: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-13 12:34:17.174354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
=======>len(all_documents) 100
Generating RE and novelty predictions
2024-05-13 12:34:42.915764: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-05-13 12:34:43.535466: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[INFO|training_args.py:804] 2024-05-13 12:34:45,298 >> using logging_steps
to initialize eval_steps
to 10
[INFO|training_args.py:1023] 2024-05-13 12:34:45,298 >> PyTorch: setting up devices
[INFO|training_args.py:885] 2024-05-13 12:34:45,686 >> The default value for the training argument --report_to
will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all
to get the same behavior as now. You should start updating your code and make this info disappear :-).
[INFO|training_args_tf.py:189] 2024-05-13 12:34:45,688 >> Tensorflow: setting up strategy
2024-05-13 12:34:46.375953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79078 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0001:00:00.0, compute capability: 8.0
2024-05-13 12:34:46.377753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 79078 MB memory: -> device: 1, name: NVIDIA A100 80GB PCIe, pci bus id: 0002:00:00.0, compute capability: 8.0
2024-05-13 12:34:46.379460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 79078 MB memory: -> device: 2, name: NVIDIA A100 80GB PCIe, pci bus id: 0003:00:00.0, compute capability: 8.0
2024-05-13 12:34:46.380943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 79078 MB memory: -> device: 3, name: NVIDIA A100 80GB PCIe, pci bus id: 0004:00:00.0, compute capability: 8.0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
05/13/2024 12:34:46 - INFO - tensorflow - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
05/13/2024 12:34:47 - INFO - main - n_replicas: 4, distributed training: True, 16-bits training: False
05/13/2024 12:34:47 - INFO - main - Training/evaluation parameters TFTrainingArguments(
_n_gpu=4,
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_pin_memory=True,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
debug=[],
deepspeed=None,
disable_tqdm=False,
do_eval=True,
do_predict=True,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_steps=10,
evaluation_strategy=IntervalStrategy.STEPS,
fp16=False,
fp16_backend=auto,
fp16_full_eval=False,
fp16_opt_level=O1,
gcp_project=None,
gradient_accumulation_steps=1,
gradient_checkpointing=False,
greater_is_better=None,
group_by_length=False,
half_precision_backend=auto,
hub_model_id=None,
hub_strategy=HubStrategy.EVERY_SAVE,
hub_token=
Hi, The error is caused by "self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states'" It looks like a Transformers package version's problem. Do you use the same environment settings as those I provided?
Hi, I downgraded the transformer version to 4.18.0 because TFtrainer was not getting imported. I tried all way to import it but it only supports older versions.
On Wed, 15 May 2024 at 10:24 AM, Po-Ting Lai @.***> wrote:
Hi, The error is caused by "self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states'" It looks like a Transformers package version's problem. Do you use the same environment settings as those I provided?
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2111581301, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYRWI6PEPVIDBWIMB4DZCLTAZAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGU4DCMZQGE . You are receiving this because you were mentioned.Message ID: @.***>
Hi, could you please run the prediction script and check if it's working?, I have tried everything I can but it's not working, please provide an input file for prediction task. Thanks
On Wed, 15 May 2024 at 10:27, Khyati Patni @.***> wrote:
Hi, I downgraded the transformer version to 4.18.0 because TFtrainer was not getting imported. I tried all way to import it but it only supports older versions.
On Wed, 15 May 2024 at 10:24 AM, Po-Ting Lai @.***> wrote:
Hi, The error is caused by "self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states'" It looks like a Transformers package version's problem. Do you use the same environment settings as those I provided?
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2111581301, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYRWI6PEPVIDBWIMB4DZCLTAZAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGU4DCMZQGE . You are receiving this because you were mentioned.Message ID: @.***>
Hi, I will try the setting again and provide you with an update afterward.
Ok, thanks for the update.
On Tue, 21 May 2024 at 8:07 PM, Po-Ting Lai @.***> wrote:
Hi, I will try the setting again and provide you with an update afterward.
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2122788034, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYRFONMNOGYIUIX6RFLZDNL4DAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSG44DQMBTGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @Khyati-Microcrispr ,
I have tried it again, and I found errors on the prediction part. However, the training stage runs well as below:
Here are my steps for reproducing the results.
OS: Win11 + WSL2 (Ubuntu 22.04.2 LTS) GPU: RTX 3080
conda create -n py39 python=3.9
conda activate py39
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python.exe -m pip install --upgrade pip
python -m pip install "tensorflow==2.10"
Then you can run the below Python script to check whether you can access GPU.
import tensorflow as tf
print(tf.__version__)
print(len(tf.config.list_physical_devices('GPU')))
print(tf.test.is_built_with_cuda())
print(tf.test.is_gpu_available())
build_info = tf.sysconfig.get_build_info()
cuda_version = build_info["cuda_version"]
cudnn_version = build_info["cudnn_version"]
print("CUDA version TensorFlow was built with:", cuda_version)
print("cuDNN version TensorFlow was built with:", cudnn_version)
Install requirements
pip install -r requirements.txt
Here is my requirements.txt
transformers == 4.18.0
accelerate == 0.9.0
pandas == 1.1.5
numpy == 1.20.0
datasets == 2.3.2
sentencepiece != 0.1.92
protobuf == 3.19.4
scispacy == 0.2.4
tensorflow == 2.9.3
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
I found there are two missing parameters in scripts/run_biored_exp.sh, which should be modified as the below.
python src/utils/run_biored_eval.py --exp_option 'to_pubtator' \
--in_pred_rel_tsv_file "out_biored_all_mul_test_results.tsv" \
--in_pred_novelty_tsv_file "out_biored_novelty_test_results.tsv" \
--in_test_tsv_file "datasets/biored/processed/test.tsv" \
--in_test_pubtator_file "datasets/biored/BioRED/Test.PubTator" \
--out_pred_pubtator_file "biored_pred_mul.txt"
In my original version, --in_test_tsv_file and --in_test_pubtator_file were missing. After fixing it, you can run the below command and get the result.
bash scripts/build_biored_dataset.sh
bash scripts/run_biored_exp.sh
Hi, Thank you so much for the update, both training and prediction scripts finally worked.
On Wed, 22 May 2024 at 19:07, Po-Ting Lai @.***> wrote:
Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr ,
I have tried it again, and I found errors on the prediction part. However, the training stage runs well as below: image.png (view on web) https://github.com/ncbi/BioRED/assets/61985809/95cb486d-bb6d-4230-80b8-f5e684a4e7d2
Here are my steps for reproducing the results.
- Environment:
OS: Win11 + WSL2 (Ubuntu 22.04.2 LTS) GPU: RTX 3080
- Setting up
conda create -n py39 python=3.9 conda activate py39 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 python.exe -m pip install --upgrade pip python -m pip install "tensorflow==2.10"
Then you can run the below Python script to check whether you can access GPU.
import tensorflow as tf print(tf.version) print(len(tf.config.list_physical_devices('GPU'))) print(tf.test.is_built_with_cuda()) print(tf.test.is_gpu_available())
build_info = tf.sysconfig.get_build_info() cuda_version = build_info["cuda_version"] cudnn_version = build_info["cudnn_version"] print("CUDA version TensorFlow was built with:", cuda_version) print("cuDNN version TensorFlow was built with:", cudnn_version)
Install requirements pip install -r requirements.txt Here is my requirements.txt
transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz
- Running the script
I found there are two missing parameters in scripts/run_biored_exp.sh, which should be modified as the below.
python src/utils/run_biored_eval.py --exp_option 'to_pubtator' \ --in_pred_rel_tsv_file "out_biored_all_mul_test_results.tsv" \ --in_pred_novelty_tsv_file "out_biored_novelty_test_results.tsv" \ --in_test_tsv_file "datasets/biored/processed/test.tsv" \ --in_test_pubtator_file "datasets/biored/BioRED/Test.PubTator" \ --out_pred_pubtator_file "biored_pred_mul.txt"
In my original version, --in_test_tsv_file and --in_test_pubtator_file were missing. After fixing it, you can run the below command and get the result.
bash scripts/build_biored_dataset.sh bash scripts/run_biored_exp.sh
— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2124820598, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYWZR27J55KGOKKBS7DZDSNQDAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUHAZDANJZHA . You are receiving this because you were mentioned.Message ID: @.***>
Thank you for providing such a valuable package that addresses the limitations of other available tools. It would be immensely helpful if you could include tutorials or, at the very least, sample input-output files for users interested in predicting new data using pretrained weights. Additionally, running bash files requires system compatibility. Could you provide a table listing compatible versions of CUDA, cuDNN, TensorFlow, and other libraries so that users can proceed smoothly on different systems? Thanks in Advance.