Tutorial, Input-Output files for using pretrained weights, system compatibility

Khyati-Microcrispr commented 4 months ago

Thank you for providing such a valuable package that addresses the limitations of other available tools. It would be immensely helpful if you could include tutorials or, at the very least, sample input-output files for users interested in predicting new data using pretrained weights. Additionally, running bash files requires system compatibility. Could you provide a table listing compatible versions of CUDA, cuDNN, TensorFlow, and other libraries so that users can proceed smoothly on different systems? Thanks in Advance.

ptlai commented 4 months ago

Hi @Khyati-Microcrispr ,

Thank you for your interest in BioRED.

Input-output files are in the same format as BioRED pubtator files available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/BC8_BioRED_Subtask1_PubTator.zip.

I have tried the configuration you mentioned in https://github.com/ncbi/BioRED/issues/1#issuecomment-2090074264. However, I am unable to reproduce the same error, and the code could not run in your configuration. I just tested this environment setting again and it worked fine. Windows 11 + WSL + Cuda version 11.2 + cuDNN version 8 requirements.txt:

transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3 https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

Please let me know if you still have any questions. Thanks.

Khyati-Microcrispr commented 4 months ago

I also tried biorex and I am getting this error :

bash scripts/run_test_pred.sh Converting the dataset into BioREx input format 2024-05-04 16:58:56.713845: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-04 16:58:56.783843: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-04 16:58:57.995882: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT number_unique_YES_instances 0 Generating RE predictions 2024-05-04 16:59:01.536550: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-04 16:59:01.606442: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-04 16:59:02.202204: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [INFO|training_args.py:804] 2024-05-04 16:59:04,468 >> using logging_steps to initialize eval_steps to 10 [INFO|training_args.py:1023] 2024-05-04 16:59:04,468 >> PyTorch: setting up devices [INFO|training_args.py:885] 2024-05-04 16:59:04,498 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [INFO|training_args_tf.py:189] 2024-05-04 16:59:04,499 >> Tensorflow: setting up strategy 2024-05-04 16:59:04.650622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20070 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:31:00.0, compute capability: 8.9 05/04/2024 16:59:04 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False 05/04/2024 16:59:04 - INFO - main - Training/evaluation parameters TFTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=10, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gcp_project=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=biorex_model/runs/May04_16-59-04_microcrispr9, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=10.0, optim=OptimizerNames.ADAMW_HF, output_dir=biorex_model, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=16, poly_power=1.0, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=biorex_model, save_on_each_node=False, save_steps=10, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, tpu_metrics_debug=False, tpu_name=None, tpu_num_cores=None, tpu_zone=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xla=False, xpu_backend=None, ) [INFO|tokenization_utils_base.py:1776] 2024-05-04 16:59:04,674 >> loading file pretrained_model/vocab.txt [INFO|tokenization_utils_base.py:1776] 2024-05-04 16:59:04,674 >> loading file pretrained_model/tokenizer.json [INFO|tokenization_utils_base.py:1776] 2024-05-04 16:59:04,674 >> loading file pretrained_model/added_tokens.json [INFO|tokenization_utils_base.py:1776] 2024-05-04 16:59:04,674 >> loading file pretrained_model/special_tokens_map.json [INFO|tokenization_utils_base.py:1776] 2024-05-04 16:59:04,674 >> loading file pretrained_model/tokenizer_config.json =======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8, 'None-CID': 9, 'CID': 10, 'None-PPIm': 11, 'PPIm': 12, 'None-AIMED': 13, 'None-DDI': 14, 'None-BC7': 15, 'None-phargkb': 16, 'None-GDA': 17, 'None-DISGENET': 18, 'None-EMU_BC': 19, 'None-EMU_PC': 20, 'None-HPRD50': 21, 'None-PHARMGKB': 22, 'ACTIVATOR': 23, 'AGONIST': 24, 'AGONIST-ACTIVATOR': 25, 'AGONIST-INHIBITOR': 26, 'ANTAGONIST': 27, 'DIRECT-REGULATOR': 28, 'INDIRECT-DOWNREGULATOR': 29, 'INDIRECT-UPREGULATOR': 30, 'INHIBITOR': 31, 'PART-OF': 32, 'PRODUCT-OF': 33, 'SUBSTRATE': 34, 'SUBSTRATE_PRODUCT-OF': 35, 'mechanism': 36, 'int': 37, 'effect': 38, 'advise': 39, 'AIMED-Association': 40, 'HPRD-Association': 41, 'EUADR-Association': 42, 'None-EUADR': 43, 'Indirect_conversion': 44, 'Non_conversion': 45} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 05/04/2024 16:59:04 - INFO - main - pos_label_ids 05/04/2024 16:59:04 - INFO - main - [1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 16, 17, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45] [INFO|configuration_utils.py:652] 2024-05-04 16:59:04,704 >> loading configuration file pretrained_model/config.json [INFO|configuration_utils.py:690] 2024-05-04 16:59:04,705 >> Model config BertConfig { "_name_or_path": "pretrained_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation", "9": "None-CID", "10": "CID", "11": "None-PPIm", "12": "PPIm", "13": "None-AIMED", "14": "None-DDI", "15": "None-BC7", "16": "None-phargkb", "17": "None-GDA", "18": "None-DISGENET", "19": "None-EMU_BC", "20": "None-EMU_PC", "21": "None-HPRD50", "22": "None-PHARMGKB", "23": "ACTIVATOR", "24": "AGONIST", "25": "AGONIST-ACTIVATOR", "26": "AGONIST-INHIBITOR", "27": "ANTAGONIST", "28": "DIRECT-REGULATOR", "29": "INDIRECT-DOWNREGULATOR", "30": "INDIRECT-UPREGULATOR", "31": "INHIBITOR", "32": "PART-OF", "33": "PRODUCT-OF", "34": "SUBSTRATE", "35": "SUBSTRATE_PRODUCT-OF", "36": "mechanism", "37": "int", "38": "effect", "39": "advise", "40": "AIMED-Association", "41": "HPRD-Association", "42": "EUADR-Association", "43": "None-EUADR", "44": "Indirect_conversion", "45": "Non_conversion" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "ACTIVATOR": 23, "AGONIST": 24, "AGONIST-ACTIVATOR": 25, "AGONIST-INHIBITOR": 26, "AIMED-Association": 40, "ANTAGONIST": 27, "Association": 1, "Bind": 2, "CID": 10, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "DIRECT-REGULATOR": 28, "Drug_Interaction": 6, "EUADR-Association": 42, "HPRD-Association": 41, "INDIRECT-DOWNREGULATOR": 29, "INDIRECT-UPREGULATOR": 30, "INHIBITOR": 31, "Indirect_conversion": 44, "Negative_Correlation": 7, "Non_conversion": 45, "None": 0, "None-AIMED": 13, "None-BC7": 15, "None-CID": 9, "None-DDI": 14, "None-DISGENET": 18, "None-EMU_BC": 19, "None-EMU_PC": 20, "None-EUADR": 43, "None-GDA": 17, "None-HPRD50": 21, "None-PHARMGKB": 22, "None-PPIm": 11, "None-phargkb": 16, "PART-OF": 32, "PPIm": 12, "PRODUCT-OF": 33, "Positive_Correlation": 8, "SUBSTRATE": 34, "SUBSTRATE_PRODUCT-OF": 35, "advise": 39, "effect": 38, "int": 37, "mechanism": 36 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28933 }

[INFO|modeling_tf_utils.py:1776] 2024-05-04 16:59:04,731 >> loading weights file pretrained_model/tf_model.h5 2024-05-04 16:59:08.957082: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:510] Can't find libdevice directory ${CUDA_DIR}/nvvm/libdevice. This may result in compilation or runtime failures, if the program we try to run uses routines from libdevice. Searched for CUDA in the following directories: ./cuda_sdk_lib /usr/local/cuda-12.3 /usr/local/cuda

/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/tensorflow/python/platform/../../../nvidia/cuda_nvcc

/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/tensorflow/python/platform/../../../../nvidia/cuda_nvcc . You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions. For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work. 2024-05-04 16:59:09.273975: W external/local_xla/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc:548] libdevice is required by this HLO module but was not found at ./libdevice.10.bc error: libdevice not found at ./libdevice.10.bc 2024-05-04 16:59:09.274245: E tensorflow/compiler/mlir/tools/kernel_gen/tf_framework_c_interface.cc:207] INTERNAL: Generating device code failed. 2024-05-04 16:59:09.275276: W tensorflow/core/framework/op_kernel.cc:1827] UNKNOWN: JIT compilation failed. 2024-05-04 16:59:09.275300: W tensorflow/core/framework/local_rendezvous.cc:404] Local rendezvous is aborting with status: UNKNOWN: JIT compilation failed. Traceback (most recent call last): File "/home/microcrispr9/Downloads/BioREx/src/run_ncbi_rel_exp.py", line 884, in main() File "/home/microcrispr9/Downloads/BioREx/src/run_ncbi_rel_exp.py", line 687, in main model = TFAutoModelForSequenceClassification.from_pretrained( File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 446, in from_pretrained return model_class.from_pretrained(pretrained_model_name_or_path, *model_args, config=config, kwargs) File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 1803, in from_pretrained model(model.dummy_inputs) # build the network with dummy inputs File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler raise e.with_traceback(filtered_tb) from None File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs return func(self, unpacked_inputs) File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/models/bert/modeling_tf_bert.py", line 1633, in call outputs = self.bert( File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/modeling_tf_utils.py", line 383, in run_call_with_unpacked_inputs return func(self, **unpacked_inputs) File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/models/bert/modeling_tf_bert.py", line 768, in call embedding_output = self.embeddings( File "/home/microcrispr9/anaconda3/envs/biorex/lib/python3.9/site-packages/transformers/models/bert/modeling_tf_bert.py", line 205, in call final_embeddings = self.LayerNorm(inputs=final_embeddings) tensorflow.python.framework.errors_impl.UnknownError: Exception encountered when calling LayerNormalization.call().

{{function_node wrappedRsqrtdevice/job:localhost/replica:0/task:0/device:GPU:0}} JIT compilation failed. [Op:Rsqrt] name:

Arguments received by LayerNormalization.call(): • inputs=tf.Tensor(shape=(3, 5, 768), dtype=float32) cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory 2024-05-04 16:59:12.166891: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-04 16:59:12.207675: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-04 16:59:12.829608: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT (biorex) @.***:~/Downloads/BioREx$

On Sat, 4 May 2024 at 16:28, Khyati Patni @.***> wrote:

Hi, Thank you for reverting back to me. I took the attached (email) pubtator file as input file, and followed the README file for predicting new data without the need for training. Still I am getting this error mentioned in an error.docx file (attached). GPU is available and tensorflow Version: 2.9.3. is detecting that. Please Help me resolve this issue. Thanks and regards, Khyati

On Sat, 4 May 2024 at 09:23, Po-Ting Lai @.***> wrote:

Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr ,

Thank you for your interest in BioRED.

Input-output files are in the same format as BioRED pubtator files available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC8-BioRED-track/BC8_BioRED_Subtask1_PubTator.zip .

I have tried the configuration you mentioned in #1 (comment) https://github.com/ncbi/BioRED/issues/1#issuecomment-2090074264. However, I am unable to reproduce the same error, and the code could not run in your configuration. I just tested this environment setting again and it worked fine. Windows 11 + WSL + Cuda version 11.2 + cuDNN version 8 requirements.txt:

transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3

https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

Please let me know if you still have any questions. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2093994269, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYXZ55ZI6FCWD3H2NR3ZARLTVAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJTHE4TIMRWHE . You are receiving this because you were mentioned.Message ID: @.***>

ptlai commented 4 months ago

Hello @Khyati-Microcrispr,

There is an error "error: libdevice not found at ./libdevice.10.bc", and the program fails. This looks like an issue of Cuda set up. Can you confirm that you've installed cuda files and they can be accessed in python?

Khyati-Microcrispr commented 4 months ago

Hi, thank you for debugging. I am using the NVIDIA GeForce RTX 4090 GPU on Ubuntu 22.04, while your team used the NVIDIA Tesla V100 SXM2. Could you please tell me the versions of CUDA, cuDNN, drivers, TensorFlow, Python, or other requirements that could be compatible to run the code? I have tried almost all possible configurations, but it's not working. Thank you for the help.

On Sat, 4 May 2024 at 20:37, Po-Ting Lai @.***> wrote:

Hello @Khyati-Microcrispr https://github.com/Khyati-Microcrispr,

There is an error "error: libdevice not found at ./libdevice.10.bc", and the program fails. This looks like an issue of Cuda set up. Can you confirm that you've installed cuda files and they can be accessed in python?

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2094248335, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYU3WFM74WFJ47REPPLZAT2SHAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUGI2DQMZTGU . You are receiving this because you were mentioned.Message ID: @.***>

ptlai commented 4 months ago

Hi @Khyati-Microcrispr , The environment.txt that I am using on NVIDIA Tesla V100 SXM2 is attached. Please let me know if you need any further information. Thanks.

Khyati-Microcrispr commented 3 months ago

Hi again, I am now using Gcloud, Cuda and other setups are working fine. I am still getting " File not found" and "Logits error" by using exp and pred commands respectively . I have attached the error files for your reference, Thank you for the help.

On Wed, 8 May 2024 at 04:32, Po-Ting Lai @.***> wrote:

Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr , The environment.txt https://github.com/ncbi/BioRED/files/15242005/environment.txt that I am using on NVIDIA Tesla V100 SXM2 is attached. Please let me know if you need any further information. Thanks.

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2099449098, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYQOT3MLZ5CC6NPR6K3ZBFMRVAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJZGQ2DSMBZHA . You are receiving this because you were mentioned.Message ID: @.***>

(biored_re) @.***:~/workspace/novoai/ground0/biored$ bash scripts/run_biored_exp.sh 0 2024-05-13 12:12:44.660373: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-13 12:12:45.254949: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [INFO|training_args.py:804] 2024-05-13 12:12:47,239 >> using logging_steps to initialize eval_steps to 10 [INFO|training_args.py:1023] 2024-05-13 12:12:47,239 >> PyTorch: setting up devices [INFO|training_args.py:885] 2024-05-13 12:12:47,797 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [INFO|training_args_tf.py:189] 2024-05-13 12:12:47,798 >> Tensorflow: setting up strategy 2024-05-13 12:12:48.491250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79078 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0001:00:00.0, compute capability: 8.0 2024-05-13 12:12:48.492825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 79078 MB memory: -> device: 1, name: NVIDIA A100 80GB PCIe, pci bus id: 0002:00:00.0, compute capability: 8.0 2024-05-13 12:12:48.494416: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 79078 MB memory: -> device: 2, name: NVIDIA A100 80GB PCIe, pci bus id: 0003:00:00.0, compute capability: 8.0 2024-05-13 12:12:48.496281: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 79078 MB memory: -> device: 3, name: NVIDIA A100 80GB PCIe, pci bus id: 0004:00:00.0, compute capability: 8.0 INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3') 05/13/2024 12:12:48 - INFO - tensorflow - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3') 05/13/2024 12:12:49 - INFO - main - n_replicas: 4, distributed training: True, 16-bits training: False 05/13/2024 12:12:49 - INFO - main - Training/evaluation parameters TFTrainingArguments( _n_gpu=4, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=10, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gcp_project=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=out_model_biored_all_mul/runs/May13_12-12-47_VMNUVOCINPOC, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=10.0, optim=OptimizerNames.ADAMW_HF, output_dir=out_model_biored_all_mul, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=16, poly_power=1.0, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out_model_biored_all_mul, save_on_each_node=False, save_steps=10, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, tpu_metrics_debug=False, tpu_name=None, tpu_num_cores=None, tpu_zone=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xla=False, xpu_backend=None, ) [INFO|configuration_utils.py:652] 2024-05-13 12:12:49,492 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:12:49,493 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }

[INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/vocab.txt [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/tokenizer.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/added_tokens.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,493 >> loading file biored_all_mul_model/special_tokens_map.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:12:49,494 >> loading file biored_all_mul_model/tokenizer_config.json [INFO|configuration_utils.py:652] 2024-05-13 12:12:49,494 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:12:49,494 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }

=======================>label2id {'None': 0, 'Association': 1, 'Bind': 2, 'Comparison': 3, 'Conversion': 4, 'Cotreatment': 5, 'Drug_Interaction': 6, 'Negative_Correlation': 7, 'Positive_Correlation': 8} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:652] 2024-05-13 12:12:49,507 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:12:49,507 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }

[INFO|modeling_tf_utils.py:1776] 2024-05-13 12:12:49,525 >> loading weights file biored_all_mul_model/tf_model.h5 [WARNING|modeling_tf_utils.py:1843] 2024-05-13 12:12:54,655 >> Some layers from the model checkpoint at biored_all_mulmodel were not used when initializing TFBertForSequenceClassification: ['bert/encoder/layer.1/intermediate/dense/bias:0', 'bert/encoder/layer.10/output/LayerNorm/beta:0', 'bert/encoder/layer.1/intermediate/dense/kernel:0', 'bert/encoder/layer.9/attention/self/key/kernel:0', 'bert/encoder/layer.8/attention/self/query/kernel:0', 'bert/encoder/layer.5/intermediate/dense/kernel:0', 'bert/encoder/layer.8/intermediate/dense/bias:0', 'bert/encoder/layer.11/output/dense/kernel:0', 'bert/encoder/layer.10/attention/output/dense/bias:0', 'bert/encoder/layer.0/attention/output/dense/bias:0', 'bert/encoder/layer.1/output/dense/bias:0', 'bert/encoder/layer.2/attention/self/key/bias:0', 'bert/embeddings/LayerNorm/beta:0', 'bert/encoder/layer.10/intermediate/dense/kernel:0', 'bert/encoder/layer.9/output/dense/bias:0', 'bert/encoder/layer.8/attention/output/LayerNorm/beta:0', 'bert/pooler/dense/bias:0', 'bert/encoder/layer.10/attention/self/query/bias:0', 'bert/encoder/layer.10/output/dense/bias:0', 'bert/encoder/layer.2/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/key/bias:0', 'bert/encoder/layer.5/attention/self/key/kernel:0', 'bert/encoder/layer.11/attention/self/key/bias:0', 'bert/encoder/layer.2/output/dense/kernel:0', 'bert/encoder/layer.9/attention/self/query/bias:0', 'bert/encoder/layer.11/attention/self/query/kernel:0', 'bert/encoder/layer.11/intermediate/dense/bias:0', 'bert/encoder/layer.2/attention/self/query/kernel:0', 'bert/encoder/layer.1/output/dense/kernel:0', 'bert/encoder/layer.5/output/dense/bias:0', 'bert/encoder/layer.6/attention/self/value/kernel:0', 'bert/encoder/layer.8/output/dense/kernel:0', 'bert/encoder/layer.0/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/attention/self/key/bias:0', 'bert/encoder/layer.5/output/LayerNorm/beta:0', 'bert/encoder/layer.11/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/self/key/bias:0', 'bert/encoder/layer.2/attention/self/value/kernel:0', 'bert/encoder/layer.8/attention/self/key/bias:0', 'bert/encoder/layer.0/attention/self/query/kernel:0', 'bert/encoder/layer.4/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.5/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.3/output/LayerNorm/gamma:0', 'bert/encoder/layer.9/attention/self/query/kernel:0', 'bert/encoder/layer.3/output/dense/bias:0', 'bert/encoder/layer.1/output/LayerNorm/beta:0', 'bert/encoder/layer.2/attention/output/dense/bias:0', 'bert/encoder/layer.7/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.3/attention/self/key/bias:0', 'bert/encoder/layer.3/attention/self/value/bias:0', 'bert/encoder/layer.10/attention/self/value/kernel:0', 'bert/encoder/layer.2/attention/output/dense/kernel:0', 'bert/encoder/layer.9/intermediate/dense/bias:0', 'bert/encoder/layer.5/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/attention/self/value/bias:0', 'bert/encoder/layer.2/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer._3/intermediate/dense/kernel:0', 'bert/embeddings/wordembeddings/weight:0', 'bert/encoder/layer.2/attention/output/LayerNorm/beta:0', 'classifier/kernel:0', 'bert/encoder/layer.0/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/value/kernel:0', 'bert/encoder/layer.5/attention/self/value/kernel:0', 'bert/encoder/layer.6/output/LayerNorm/beta:0', 'bert/encoder/layer.1/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/attention/self/value/bias:0', 'bert/encoder/layer.4/output/dense/bias:0', 'bert/encoder/layer.7/attention/self/value/kernel:0', 'bert/encoder/layer.2/output/dense/bias:0', 'bert/encoder/layer.5/attention/output/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/bias:0', 'bert/encoder/layer.8/attention/self/value/kernel:0', 'bert/encoder/layer.10/attention/output/dense/kernel:0', 'bert/encoder/layer.7/attention/self/query/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/intermediate/dense/kernel:0', 'bert/encoder/layer.3/intermediate/dense/bias:0', 'bert/encoder/layer.0/output/dense/kernel:0', 'bert/encoder/layer.1/attention/self/key/bias:0', 'bert/encoder/layer.3/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/value/bias:0', 'bert/encoder/layer.11/attention/self/query/bias:0', 'bert/encoder/layer.3/attention/self/key/kernel:0', 'bert/encoder/layer.0/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/dense/kernel:0', 'bert/encoder/layer.10/attention/self/query/kernel:0', 'bert/encoder/layer.9/intermediate/dense/kernel:0', 'bert/encoder/layer.9/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/dense/kernel:0', 'bert/pooler/dense/kernel:0', 'bert/encoder/layer.0/intermediate/dense/bias:0', 'bert/encoder/layer._1/attention/self/query/kernel:0', 'bert/embeddings/positionembeddings/embeddings:0', 'bert/encoder/layer._0/attention/output/dense/kernel:0', 'bert/embeddings/token_typeembeddings/embeddings:0', 'bert/encoder/layer.6/output/dense/bias:0', 'bert/encoder/layer.7/attention/self/value/bias:0', 'bert/encoder/layer.9/attention/output/dense/kernel:0', 'bert/encoder/layer.11/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/intermediate/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/query/bias:0', 'bert/encoder/layer.3/attention/self/query/bias:0', 'bert/encoder/layer.11/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/LayerNorm/beta:0', 'bert/encoder/layer.2/attention/self/query/bias:0', 'bert/encoder/layer.8/output/dense/bias:0', 'bert/encoder/layer.11/attention/self/key/kernel:0', 'bert/encoder/layer.0/attention/self/key/kernel:0', 'bert/encoder/layer.3/attention/self/value/kernel:0', 'bert/encoder/layer.8/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/attention/self/value/bias:0', 'bert/encoder/layer.9/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/attention/self/query/kernel:0', 'bert/encoder/layer.4/attention/self/key/kernel:0', 'bert/encoder/layer.7/attention/output/dense/kernel:0', 'bert/encoder/layer.8/attention/output/dense/bias:0', 'bert/encoder/layer.10/attention/self/key/kernel:0', 'bert/encoder/layer.6/attention/output/dense/kernel:0', 'bert/encoder/layer.6/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/output/dense/bias:0', 'bert/encoder/layer.7/attention/self/query/bias:0', 'bert/encoder/layer.7/attention/self/key/bias:0', 'bert/encoder/layer.10/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.5/attention/self/key/bias:0', 'bert/encoder/layer.3/attention/output/dense/bias:0', 'bert/encoder/layer.1/attention/self/key/kernel:0', 'bert/encoder/layer.5/attention/self/value/bias:0', 'bert/encoder/layer.9/output/LayerNorm/beta:0', 'bert/encoder/layer.1/attention/self/value/kernel:0', 'bert/encoder/layer.9/attention/self/value/bias:0', 'bert/encoder/layer.7/output/dense/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.3/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/query/kernel:0', 'bert/encoder/layer.10/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.7/output/dense/bias:0', 'bert/encoder/layer.10/output/dense/kernel:0', 'classifier/bias:0', 'bert/encoder/layer.2/intermediate/dense/bias:0', 'bert/encoder/layer.7/output/LayerNorm/beta:0', 'bert/encoder/layer.8/attention/self/key/kernel:0', 'bert/encoder/layer.7/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.9/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/intermediate/dense/bias:0', 'bert/encoder/layer.1/attention/self/query/bias:0', 'bert/encoder/layer.6/attention/self/key/kernel:0', 'bert/encoder/layer.5/output/dense/kernel:0', 'bert/encoder/layer.4/attention/output/dense/kernel:0', 'bert/encoder/layer.6/attention/self/value/bias:0', 'bert/encoder/layer.8/attention/self/value/bias:0', 'bert/encoder/layer.1/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.11/output/dense/bias:0', 'bert/encoder/layer.4/attention/output/dense/bias:0', 'bert/encoder/layer.0/attention/self/value/bias:0', 'bert/encoder/layer.11/attention/output/dense/bias:0', 'bert/encoder/layer.11/attention/self/value/bias:0', 'bert/encoder/layer.0/attention/self/key/bias:0', 'bert/encoder/layer.3/output/dense/kernel:0', 'bert/encoder/layer.4/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.0/attention/self/query/bias:0', 'bert/encoder/layer.9/output/dense/kernel:0', 'bert/encoder/layer.6/attention/self/query/bias:0', 'bert/encoder/layer.11/attention/output/dense/kernel:0', 'bert/encoder/layer.9/attention/self/value/kernel:0', 'bert/encoder/layer.10/attention/self/key/bias:0', 'bert/embeddings/LayerNorm/gamma:0', 'bert/encoder/layer.8/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/intermediate/dense/bias:0', 'bert/encoder/layer.0/attention/self/value/kernel:0', 'bert/encoder/layer.4/attention/self/query/bias:0', 'bert/encoder/layer.0/output/dense/bias:0', 'bert/encoder/layer.0/intermediate/dense/kernel:0', 'bert/encoder/layer.7/attention/self/key/kernel:0', 'bert/encoder/layer.8/attention/output/dense/kernel:0', 'bert/encoder/layer.6/intermediate/dense/bias:0', 'bert/encoder/layer.7/attention/output/dense/bias:0', 'bert/encoder/layer.3/attention/self/query/kernel:0', 'bert/encoder/layer.7/intermediate/dense/bias:0', 'bert/encoder/layer.5/attention/output/dense/bias:0', 'bert/encoder/layer.10/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/attention/self/query/bias:0', 'bert/encoder/layer.11/intermediate/dense/kernel:0', 'bert/encoder/layer.4/attention/self/query/kernel:0', 'bert/encoder/layer.8/intermediate/dense/kernel:0', 'bert/encoder/layer.2/attention/self/key/kernel:0', 'bert/encoder/layer.0/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/intermediate/dense/kernel:0', 'bert/encoder/layer.5/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.7/intermediate/dense/kernel:0', 'bert/encoder/layer.6/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/self/value/kernel:0', 'bert/encoder/layer.7/output/LayerNorm/gamma:0', 'bert/encoder/layer.9/attention/output/dense/bias:0', 'bert/encoder/layer.8/output/LayerNorm/beta:0', 'bert/encoder/layer._5/intermediate/dense/bias:0']

This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_tf_utils.py:1855] 2024-05-13 12:12:54,655 >> Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at biored_all_mul_model and are newly initialized: [''] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22896/22896 [00:07<00:00, 3254.29 examples/s] Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6659/6659 [00:02<00:00, 3250.22 examples/s] /home/isharma/miniconda3/envs/biored_re/lib/python3.9/site-packages/transformers/trainer_tf.py:108: FutureWarning: The class TFTrainer is deprecated and will be removed in version 5 of Transformers. We recommend using native Keras instead, by calling methods like fit() and predict() directly on the model object. Detailed examples of the Keras style can be found in our examples at https://github.com/huggingface/transformers/tree/main/examples/tensorflow warnings.warn( [INFO|trainer_tf.py:124] 2024-05-13 12:13:04,719 >> You are instantiating a Trainer but W&B is not installed. To use wandb logging, run pip install wandb && wandb login see https://docs.wandb.com/huggingface. [INFO|trainer_tf.py:132] 2024-05-13 12:13:04,719 >> To use comet_ml logging, run pip/conda install comet_ml see https://www.comet.ml/docs/python-sdk/huggingface/ 2024-05-13 12:13:05.630757: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality transformation is currently not handled by the auto-shard rewrite and will be removed. 05/13/2024 12:13:05 - INFO - tf_wrapper - Running training 05/13/2024 12:13:05 - INFO - tf_wrapper - Num examples = 22896 05/13/2024 12:13:05 - INFO - tf_wrapper - Num Epochs = 10 05/13/2024 12:13:05 - INFO - tf_wrapper - Instantaneous batch size per device = 16 05/13/2024 12:13:05 - INFO - tf_wrapper - Total train batch size (w. parallel, distributed & accumulation) = 64 05/13/2024 12:13:05 - INFO - tf_wrapper - Gradient Accumulation steps = 1 05/13/2024 12:13:05 - INFO - tf_wrapper - Steps per epoch = 358 05/13/2024 12:13:05 - INFO - tf_wrapper - Total optimization steps = 3580 2024-05-13 12:13:15.981625: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:450] ShuffleDatasetV3:7: Filling up shuffle buffer (this may take a while): 14657 of 22896 2024-05-13 12:13:21.430681: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:480] Shuffle buffer filled. [INFO|trainer_tf.py:411] 2024-05-13 12:15:54,153 >> {'loss': 0.29368764, 'learning_rate': 9.972066e-06, 'epoch': 0.027932960893854747, 'step': 10} [INFO|trainer_tf.py:411] 2024-05-13 12:15:56,527 >> {'loss': 0.23094805, 'learning_rate': 9.944134e-06, 'epoch': 0.055865921787709494, 'step': 20} [INFO|trainer_tf.py:411] 2024-05-13 12:15:58,919 >> {'loss': 0.2152964, 'learning_rate': 9.916201e-06, 'epoch': 0.08379888268156424, 'step': 30} [INFO|trainer_tf.py:411] 2024-05-13 12:16:01,325 >> {'loss': 0.2096351, 'learning_rate': 9.888267e-06, 'epoch': 0.11173184357541899, 'step': 40} [INFO|trainer_tf.py:411] 2024-05-13 12:16:03,721 >> {'loss': 0.20234583, 'learning_rate': 9.860335e-06, 'epoch': 0.13966480446927373, 'step': 50} [INFO|trainer_tf.py:411] 2024-05-13 12:16:06,123 >> {'loss': 0.20314902, 'learning_rate': 9.832403e-06, 'epoch': 0.16759776536312848, 'step': 60} [INFO|trainer_tf.py:411] 2024-05-13 12:16:08,561 >> {'loss': 0.19634084, 'learning_rate': 9.804469e-06, 'epoch': 0.19553072625698323, 'step': 70} [INFO|trainer_tf.py:411] 2024-05-13 12:16:10,986 >> {'loss': 0.19569454, 'learning_rate': 9.776536e-06, 'epoch': 0.22346368715083798, 'step': 80} [INFO|trainer_tf.py:411] 2024-05-13 12:16:13,378 >> {'loss': 0.1949387, 'learning_rate': 9.748603e-06, 'epoch': 0.25139664804469275, 'step': 90} [INFO|trainer_tf.py:411] 2024-05-13 12:16:15,798 >> {'loss': 0.19381893, 'learning_rate': 9.72067e-06, 'epoch': 0.27932960893854747, 'step': 100} [INFO|trainer_tf.py:411] 2024-05-13 12:16:18,209 >> {'loss': 0.19055915, 'learning_rate': 9.692737e-06, 'epoch': 0.30726256983240224, 'step': 110} [INFO|trainer_tf.py:411] 2024-05-13 12:16:20,608 >> {'loss': 0.18848367, 'learning_rate': 9.664804e-06, 'epoch': 0.33519553072625696, 'step': 120} [INFO|trainer_tf.py:411] 2024-05-13 12:16:22,998 >> {'loss': 0.18808162, 'learning_rate': 9.636871e-06, 'epoch': 0.36312849162011174, 'step': 130} [INFO|trainer_tf.py:411] 2024-05-13 12:16:25,412 >> {'loss': 0.18767141, 'learning_rate': 9.608939e-06, 'epoch': 0.39106145251396646, 'step': 140} [INFO|trainer_tf.py:411] 2024-05-13 12:16:27,845 >> {'loss': 0.18443914, 'learning_rate': 9.5810055e-06, 'epoch': 0.41899441340782123, 'step': 150} [INFO|trainer_tf.py:411] 2024-05-13 12:16:30,257 >> {'loss': 0.18058124, 'learning_rate': 9.553072e-06, 'epoch': 0.44692737430167595, 'step': 160} [INFO|trainer_tf.py:411] 2024-05-13 12:16:32,657 >> {'loss': 0.17980446, 'learning_rate': 9.52514e-06, 'epoch': 0.4748603351955307, 'step': 170} [INFO|trainer_tf.py:411] 2024-05-13 12:16:35,046 >> {'loss': 0.17969151, 'learning_rate': 9.4972065e-06, 'epoch': 0.5027932960893855, 'step': 180} [INFO|trainer_tf.py:411] 2024-05-13 12:16:37,444 >> {'loss': 0.17940535, 'learning_rate': 9.469273e-06, 'epoch': 0.5307262569832403, 'step': 190} [INFO|trainer_tf.py:411] 2024-05-13 12:16:39,871 >> {'loss': 0.17969364, 'learning_rate': 9.44134e-06, 'epoch': 0.5586592178770949, 'step': 200} [INFO|trainer_tf.py:411] 2024-05-13 12:16:42,285 >> {'loss': 0.17845039, 'learning_rate': 9.4134075e-06, 'epoch': 0.5865921787709497, 'step': 210} [INFO|trainer_tf.py:411] 2024-05-13 12:16:44,709 >> {'loss': 0.18089801, 'learning_rate': 9.385475e-06, 'epoch': 0.6145251396648045, 'step': 220} [INFO|trainer_tf.py:411] 2024-05-13 12:16:47,108 >> {'loss': 0.18049736, 'learning_rate': 9.357542e-06, 'epoch': 0.6424581005586593, 'step': 230} [INFO|trainer_tf.py:411] 2024-05-13 12:16:49,511 >> {'loss': 0.18020788, 'learning_rate': 9.3296085e-06, 'epoch': 0.6703910614525139, 'step': 240} [INFO|trainer_tf.py:411] 2024-05-13 12:16:51,916 >> {'loss': 0.18143189, 'learning_rate': 9.301676e-06, 'epoch': 0.6983240223463687, 'step': 250} [INFO|trainer_tf.py:411] 2024-05-13 12:16:54,317 >> {'loss': 0.18115883, 'learning_rate': 9.273743e-06, 'epoch': 0.7262569832402235, 'step': 260} [INFO|trainer_tf.py:411] 2024-05-13 12:16:56,727 >> {'loss': 0.18003432, 'learning_rate': 9.245809e-06, 'epoch': 0.7541899441340782, 'step': 270} [INFO|trainer_tf.py:411] 2024-05-13 12:16:59,130 >> {'loss': 0.17975433, 'learning_rate': 9.217877e-06, 'epoch': 0.7821229050279329, 'step': 280} [INFO|trainer_tf.py:411] 2024-05-13 12:17:01,550 >> {'loss': 0.17938477, 'learning_rate': 9.189944e-06, 'epoch': 0.8100558659217877, 'step': 290} [INFO|trainer_tf.py:411] 2024-05-13 12:17:03,957 >> {'loss': 0.17852996, 'learning_rate': 9.162011e-06, 'epoch': 0.8379888268156425, 'step': 300} [INFO|trainer_tf.py:411] 2024-05-13 12:17:06,377 >> {'loss': 0.17756976, 'learning_rate': 9.134078e-06, 'epoch': 0.8659217877094972, 'step': 310} [INFO|trainer_tf.py:411] 2024-05-13 12:17:08,795 >> {'loss': 0.1765033, 'learning_rate': 9.106146e-06, 'epoch': 0.8938547486033519, 'step': 320} [INFO|trainer_tf.py:411] 2024-05-13 12:17:11,203 >> {'loss': 0.17563178, 'learning_rate': 9.078212e-06, 'epoch': 0.9217877094972067, 'step': 330} [INFO|trainer_tf.py:411] 2024-05-13 12:17:13,612 >> {'loss': 0.17481384, 'learning_rate': 9.050279e-06, 'epoch': 0.9497206703910615, 'step': 340} [INFO|trainer_tf.py:411] 2024-05-13 12:17:16,013 >> {'loss': 0.17335112, 'learning_rate': 9.022346e-06, 'epoch': 0.9776536312849162, 'step': 350} 2024-05-13 12:17:18.027665: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality transformation is currently not handled by the auto-shard rewrite and will be removed. [INFO|trainer_tf.py:313] 2024-05-13 12:17:18,034 >> Running Evaluation [INFO|trainer_tf.py:314] 2024-05-13 12:17:18,034 >> Num examples in dataset = 6659 [INFO|trainer_tf.py:316] 2024-05-13 12:17:18,034 >> Num examples in used in evaluation = 6784 [INFO|trainer_tf.py:317] 2024-05-13 12:17:18,034 >> Batch size = 128 Traceback (most recent call last): File "/home/isharma/workspace/novoai/ground0/biored/src/run_biored_exp.py", line 795, in main() File "/home/isharma/workspace/novoai/ground0/biored/src/run_biored_exp.py", line 749, in main learner.train(training_args.output_dir) File "/home/isharma/workspace/novoai/ground0/biored/src/tf_wrapper.py", line 241, in train result = self.evaluate() File "/home/isharma/miniconda3/envs/biored_re/lib/python3.9/site-packages/transformers/trainer_tf.py", line 433, in evaluate output = self.prediction_loop(eval_ds, steps, num_examples, description="Evaluation") File "/home/isharma/miniconda3/envs/biored_re/lib/python3.9/site-packages/transformers/trainer_tf.py", line 321, in prediction_loop self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states' cp: cannot stat 'out_model_biored_all_mul/test_results.tsv': No such file or directory 2024-05-13 12:17:29.658955: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-13 12:17:30.352839: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [INFO|training_args.py:804] 2024-05-13 12:17:32,130 >> using logging_steps to initialize eval_steps to 10 [INFO|training_args.py:1023] 2024-05-13 12:17:32,130 >> PyTorch: setting up devices [INFO|training_args.py:885] 2024-05-13 12:17:32,571 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [INFO|training_args_tf.py:189] 2024-05-13 12:17:32,572 >> Tensorflow: setting up strategy 2024-05-13 12:17:33.220672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79078 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0001:00:00.0, compute capability: 8.0 2024-05-13 12:17:33.222276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 79078 MB memory: -> device: 1, name: NVIDIA A100 80GB PCIe, pci bus id: 0002:00:00.0, compute capability: 8.0 2024-05-13 12:17:33.223777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 79078 MB memory: -> device: 2, name: NVIDIA A100 80GB PCIe, pci bus id: 0003:00:00.0, compute capability: 8.0 2024-05-13 12:17:33.225374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 79078 MB memory: -> device: 3, name: NVIDIA A100 80GB PCIe, pci bus id: 0004:00:00.0, compute capability: 8.0 INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3') 05/13/2024 12:17:33 - INFO - tensorflow - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3') 05/13/2024 12:17:34 - INFO - main - n_replicas: 4, distributed training: True, 16-bits training: False 05/13/2024 12:17:34 - INFO - main - Training/evaluation parameters TFTrainingArguments( _n_gpu=4, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=10, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gcp_project=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=out_model_biored_novelty/runs/May13_12-17-32_VMNUVOCINPOC, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=10.0, optim=OptimizerNames.ADAMW_HF, output_dir=out_model_biored_novelty, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=16, poly_power=1.0, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=out_model_biored_novelty, save_on_each_node=False, save_steps=10, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, tpu_metrics_debug=False, tpu_name=None, tpu_num_cores=None, tpu_zone=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xla=False, xpu_backend=None, ) [INFO|configuration_utils.py:652] 2024-05-13 12:17:34,230 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:17:34,231 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }

[INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/vocab.txt [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/tokenizer.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/added_tokens.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/special_tokens_map.json [INFO|tokenization_utils_base.py:1776] 2024-05-13 12:17:34,232 >> loading file biored_all_mul_model/tokenizer_config.json [INFO|configuration_utils.py:652] 2024-05-13 12:17:34,232 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:17:34,233 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "Association", "2": "Bind", "3": "Comparison", "4": "Conversion", "5": "Cotreatment", "6": "Drug_Interaction", "7": "Negative_Correlation", "8": "Positive_Correlation" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "Association": 1, "Bind": 2, "Comparison": 3, "Conversion": 4, "Cotreatment": 5, "Drug_Interaction": 6, "Negative_Correlation": 7, "None": 0, "Positive_Correlation": 8 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }

=======================>label2id {'None': 0, 'No': 1, 'Novel': 2} =======================>positive_label =======================>use_balanced_neg False =======================>max_neg_scale 2 [INFO|configuration_utils.py:652] 2024-05-13 12:17:34,244 >> loading configuration file biored_all_mul_model/config.json [INFO|configuration_utils.py:690] 2024-05-13 12:17:34,245 >> Model config BertConfig { "_name_or_path": "biored_all_mul_model", "architectures": [ "BertForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "classifier_dropout": null, "finetuning_task": "text-classification", "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 768, "id2label": { "0": "None", "1": "No", "2": "Novel" }, "initializer_range": 0.02, "intermediate_size": 3072, "label2id": { "No": 1, "None": 0, "Novel": 2 }, "layer_norm_eps": 1e-12, "max_position_embeddings": 512, "model_type": "bert", "num_attention_heads": 12, "num_hidden_layers": 12, "pad_token_id": 0, "position_embedding_type": "absolute", "transformers_version": "4.18.0", "type_vocab_size": 2, "use_cache": true, "vocab_size": 28901 }

[INFO|modeling_tf_utils.py:1776] 2024-05-13 12:17:34,263 >> loading weights file biored_all_mul_model/tf_model.h5 [WARNING|modeling_tf_utils.py:1843] 2024-05-13 12:17:39,414 >> Some layers from the model checkpoint at biored_all_mulmodel were not used when initializing TFBertForSequenceClassification: ['bert/encoder/layer.3/attention/self/value/kernel:0', 'bert/encoder/layer.2/output/dense/kernel:0', 'bert/encoder/layer.7/intermediate/dense/bias:0', 'bert/encoder/layer.2/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/dense/kernel:0', 'bert/encoder/layer.1/output/dense/kernel:0', 'bert/encoder/layer.3/output/LayerNorm/gamma:0', 'bert/encoder/layer.2/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/self/query/bias:0', 'bert/encoder/layer.9/output/dense/bias:0', 'bert/encoder/layer.1/attention/self/key/bias:0', 'bert/encoder/layer.5/attention/output/dense/kernel:0', 'bert/encoder/layer.6/attention/self/key/bias:0', 'bert/encoder/layer.10/output/dense/kernel:0', 'bert/encoder/layer.1/attention/self/value/bias:0', 'bert/encoder/layer.8/attention/self/query/kernel:0', 'bert/encoder/layer.5/attention/self/value/kernel:0', 'bert/encoder/layer.8/attention/self/query/bias:0', 'bert/encoder/layer.5/output/dense/kernel:0', 'bert/encoder/layer.0/attention/output/dense/bias:0', 'bert/encoder/layer.1/intermediate/dense/kernel:0', 'bert/encoder/layer.6/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/self/value/kernel:0', 'bert/encoder/layer.9/attention/output/dense/bias:0', 'bert/encoder/layer.6/intermediate/dense/bias:0', 'bert/encoder/layer.11/attention/self/key/kernel:0', 'bert/encoder/layer.7/output/dense/kernel:0', 'bert/encoder/layer.6/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/output/dense/bias:0', 'bert/encoder/layer.4/intermediate/dense/bias:0', 'bert/encoder/layer.8/intermediate/dense/bias:0', 'bert/encoder/layer.2/attention/self/query/kernel:0', 'bert/encoder/layer.2/output/LayerNorm/beta:0', 'bert/encoder/layer.5/attention/self/value/bias:0', 'bert/encoder/layer.2/intermediate/dense/kernel:0', 'bert/encoder/layer.10/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/output/dense/bias:0', 'bert/encoder/layer.8/attention/output/dense/bias:0', 'bert/encoder/layer.2/attention/output/dense/bias:0', 'bert/encoder/layer.7/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/attention/output/LayerNorm/beta:0', 'bert/encoder/layer._10/output/LayerNorm/gamma:0', 'bert/embeddings/token_typeembeddings/embeddings:0', 'bert/encoder/layer.1/attention/self/query/bias:0', 'bert/encoder/layer.7/attention/self/key/kernel:0', 'bert/encoder/layer.11/attention/self/key/bias:0', 'bert/encoder/layer.4/attention/self/key/kernel:0', 'bert/encoder/layer.8/attention/self/value/kernel:0', 'bert/encoder/layer._7/attention/output/LayerNorm/gamma:0', 'bert/embeddings/positionembeddings/embeddings:0', 'bert/encoder/layer.0/output/dense/bias:0', 'bert/encoder/layer.4/output/LayerNorm/beta:0', 'bert/encoder/layer.5/output/LayerNorm/gamma:0', 'bert/encoder/layer.1/attention/self/key/kernel:0', 'bert/encoder/layer.10/attention/output/dense/kernel:0', 'bert/encoder/layer.1/output/LayerNorm/beta:0', 'bert/encoder/layer.11/intermediate/dense/bias:0', 'bert/encoder/layer.10/attention/self/value/kernel:0', 'bert/encoder/layer.4/attention/self/query/bias:0', 'bert/encoder/layer.4/output/dense/bias:0', 'bert/encoder/layer.7/attention/output/dense/kernel:0', 'bert/encoder/layer.7/attention/self/query/kernel:0', 'bert/encoder/layer.5/attention/self/key/kernel:0', 'bert/encoder/layer.3/intermediate/dense/bias:0', 'bert/encoder/layer.9/attention/self/value/bias:0', 'bert/encoder/layer.6/attention/self/value/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/output/LayerNorm/beta:0', 'bert/encoder/layer.6/attention/self/value/bias:0', 'bert/encoder/layer.3/attention/self/value/bias:0', 'bert/encoder/layer.2/attention/output/dense/kernel:0', 'bert/encoder/layer.2/attention/self/value/bias:0', 'bert/encoder/layer.7/intermediate/dense/kernel:0', 'bert/encoder/layer.5/attention/output/dense/bias:0', 'bert/encoder/layer.5/intermediate/dense/bias:0', 'bert/encoder/layer.7/attention/self/value/kernel:0', 'bert/encoder/layer.5/intermediate/dense/kernel:0', 'bert/encoder/layer.2/intermediate/dense/bias:0', 'bert/encoder/layer.4/output/dense/kernel:0', 'bert/encoder/layer.9/attention/self/key/bias:0', 'bert/encoder/layer.10/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/output/dense/bias:0', 'bert/encoder/layer.6/attention/self/query/kernel:0', 'bert/encoder/layer.8/output/dense/kernel:0', 'bert/encoder/layer.11/attention/self/value/kernel:0', 'bert/encoder/layer.5/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/query/bias:0', 'bert/encoder/layer.11/attention/self/query/bias:0', 'bert/encoder/layer.7/output/dense/bias:0', 'bert/encoder/layer.8/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/self/value/bias:0', 'bert/encoder/layer.10/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.10/attention/self/query/kernel:0', 'bert/encoder/layer.6/intermediate/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/bias:0', 'bert/encoder/layer.8/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.5/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/output/dense/kernel:0', 'bert/encoder/layer.0/output/dense/kernel:0', 'bert/encoder/layer.11/attention/self/query/kernel:0', 'bert/pooler/dense/bias:0', 'bert/encoder/layer.3/attention/output/dense/bias:0', 'bert/encoder/layer.10/attention/self/query/bias:0', 'bert/encoder/layer.8/attention/self/key/kernel:0', 'bert/encoder/layer.3/attention/output/dense/kernel:0', 'bert/encoder/layer.1/output/dense/bias:0', 'bert/encoder/layer.10/attention/self/key/bias:0', 'bert/encoder/layer.8/intermediate/dense/kernel:0', 'bert/encoder/layer.11/output/dense/bias:0', 'bert/encoder/layer.11/output/LayerNorm/gamma:0', 'bert/encoder/layer.10/attention/self/value/bias:0', 'bert/encoder/layer.0/intermediate/dense/bias:0', 'bert/encoder/layer.1/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.11/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/self/key/kernel:0', 'bert/encoder/layer.5/attention/self/query/kernel:0', 'bert/encoder/layer.11/attention/output/dense/bias:0', 'classifier/bias:0', 'bert/encoder/layer.9/intermediate/dense/bias:0', 'bert/encoder/layer.4/attention/output/dense/kernel:0', 'bert/encoder/layer.5/attention/self/key/bias:0', 'bert/encoder/layer.9/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.4/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.0/attention/output/dense/kernel:0', 'bert/encoder/layer.9/output/dense/kernel:0', 'bert/encoder/layer.1/attention/output/dense/kernel:0', 'bert/encoder/layer.10/intermediate/dense/kernel:0', 'bert/encoder/layer.4/attention/self/query/kernel:0', 'bert/encoder/layer.0/attention/self/query/kernel:0', 'bert/encoder/layer.0/attention/self/key/bias:0', 'bert/encoder/layer.10/attention/self/key/kernel:0', 'bert/encoder/layer.2/attention/self/key/bias:0', 'bert/encoder/layer.7/attention/output/LayerNorm/beta:0', 'classifier/kernel:0', 'bert/encoder/layer.2/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer.3/attention/self/query/bias:0', 'bert/encoder/layer.1/attention/self/value/kernel:0', 'bert/encoder/layer.9/attention/self/query/bias:0', 'bert/encoder/layer.4/intermediate/dense/kernel:0', 'bert/encoder/layer.4/attention/self/key/bias:0', 'bert/encoder/layer.3/output/LayerNorm/beta:0', 'bert/embeddings/LayerNorm/gamma:0', 'bert/encoder/layer.10/intermediate/dense/bias:0', 'bert/encoder/layer.3/intermediate/dense/kernel:0', 'bert/encoder/layer.9/intermediate/dense/kernel:0', 'bert/encoder/layer.6/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.4/output/LayerNorm/gamma:0', 'bert/encoder/layer.6/output/dense/bias:0', 'bert/encoder/layer.1/intermediate/dense/bias:0', 'bert/encoder/layer.4/attention/self/value/bias:0', 'bert/encoder/layer.4/attention/output/dense/bias:0', 'bert/encoder/layer.1/attention/self/query/kernel:0', 'bert/encoder/layer.9/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.10/attention/output/dense/bias:0', 'bert/encoder/layer.0/intermediate/dense/kernel:0', 'bert/encoder/layer.3/attention/self/key/kernel:0', 'bert/encoder/layer.2/attention/self/value/kernel:0', 'bert/encoder/layer.6/attention/self/key/kernel:0', 'bert/encoder/layer.2/attention/self/key/kernel:0', 'bert/encoder/layer.3/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.11/attention/output/dense/kernel:0', 'bert/encoder/layer.11/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.7/output/LayerNorm/beta:0', 'bert/encoder/layer.1/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.5/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.3/attention/self/query/kernel:0', 'bert/encoder/layer.3/output/dense/bias:0', 'bert/encoder/layer.11/output/dense/kernel:0', 'bert/encoder/layer.3/attention/self/key/bias:0', 'bert/encoder/layer.8/output/LayerNorm/beta:0', 'bert/encoder/layer.9/attention/self/query/kernel:0', 'bert/encoder/layer.2/output/dense/bias:0', 'bert/embeddings/LayerNorm/beta:0', 'bert/pooler/dense/kernel:0', 'bert/encoder/layer.5/output/dense/bias:0', 'bert/encoder/layer.9/attention/self/key/kernel:0', 'bert/encoder/layer.0/output/LayerNorm/gamma:0', 'bert/encoder/layer.8/attention/self/key/bias:0', 'bert/encoder/layer.8/attention/output/LayerNorm/beta:0', 'bert/encoder/layer.9/output/LayerNorm/gamma:0', 'bert/encoder/layer.0/attention/self/query/bias:0', 'bert/encoder/layer.7/attention/output/dense/bias:0', 'bert/encoder/layer.11/output/LayerNorm/beta:0', 'bert/encoder/layer.4/attention/self/value/kernel:0', 'bert/encoder/layer.7/attention/self/value/bias:0', 'bert/encoder/layer._7/attention/self/key/bias:0', 'bert/embeddings/wordembeddings/weight:0', 'bert/encoder/layer.0/attention/self/value/bias:0', 'bert/encoder/layer.8/attention/self/value/bias:0', 'bert/encoder/layer.9/output/LayerNorm/beta:0', 'bert/encoder/layer.2/attention/self/query/bias:0', 'bert/encoder/layer.0/attention/self/value/kernel:0', 'bert/encoder/layer.6/attention/output/dense/kernel:0', 'bert/encoder/layer.3/output/dense/kernel:0', 'bert/encoder/layer.11/intermediate/dense/kernel:0', 'bert/encoder/layer._7/attention/self/query/bias:0']

This IS expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing TFBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_tf_utils.py:1855] 2024-05-13 12:17:39,414 >> Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at biored_all_mul_model and are newly initialized: [''] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4175/4175 [00:01<00:00, 3291.93 examples/s] Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1161/1161 [00:00<00:00, 3170.23 examples/s] /home/isharma/miniconda3/envs/biored_re/lib/python3.9/site-packages/transformers/trainer_tf.py:108: FutureWarning: The class TFTrainer is deprecated and will be removed in version 5 of Transformers. We recommend using native Keras instead, by calling methods like fit() and predict() directly on the model object. Detailed examples of the Keras style can be found in our examples at https://github.com/huggingface/transformers/tree/main/examples/tensorflow warnings.warn( [INFO|trainer_tf.py:124] 2024-05-13 12:17:41,673 >> You are instantiating a Trainer but W&B is not installed. To use wandb logging, run pip install wandb && wandb login see https://docs.wandb.com/huggingface. [INFO|trainer_tf.py:132] 2024-05-13 12:17:41,673 >> To use comet_ml logging, run pip/conda install comet_ml see https://www.comet.ml/docs/python-sdk/huggingface/ 2024-05-13 12:17:42.515636: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality transformation is currently not handled by the auto-shard rewrite and will be removed. 05/13/2024 12:17:42 - INFO - tf_wrapper - Running training 05/13/2024 12:17:42 - INFO - tf_wrapper - Num examples = 4175 05/13/2024 12:17:42 - INFO - tf_wrapper - Num Epochs = 10 05/13/2024 12:17:42 - INFO - tf_wrapper - Instantaneous batch size per device = 16 05/13/2024 12:17:42 - INFO - tf_wrapper - Total train batch size (w. parallel, distributed & accumulation) = 64 05/13/2024 12:17:42 - INFO - tf_wrapper - Gradient Accumulation steps = 1 05/13/2024 12:17:42 - INFO - tf_wrapper - Steps per epoch = 66 05/13/2024 12:17:42 - INFO - tf_wrapper - Total optimization steps = 660 [INFO|trainer_tf.py:411] 2024-05-13 12:20:19,894 >> {'loss': 0.18754996, 'learning_rate': 9.848484e-06, 'epoch': 0.15151515151515152, 'step': 10} [INFO|trainer_tf.py:411] 2024-05-13 12:20:22,274 >> {'loss': 0.17843515, 'learning_rate': 9.69697e-06, 'epoch': 0.30303030303030304, 'step': 20} [INFO|trainer_tf.py:411] 2024-05-13 12:20:24,692 >> {'loss': 0.17367719, 'learning_rate': 9.545454e-06, 'epoch': 0.45454545454545453, 'step': 30} [INFO|trainer_tf.py:411] 2024-05-13 12:20:27,083 >> {'loss': 0.16819657, 'learning_rate': 9.393939e-06, 'epoch': 0.6060606060606061, 'step': 40} [INFO|trainer_tf.py:411] 2024-05-13 12:20:29,482 >> {'loss': 0.16544081, 'learning_rate': 9.242424e-06, 'epoch': 0.7575757575757576, 'step': 50} [INFO|trainer_tf.py:411] 2024-05-13 12:20:31,891 >> {'loss': 0.16649602, 'learning_rate': 9.090909e-06, 'epoch': 0.9090909090909091, 'step': 60} 2024-05-13 12:20:33.373627: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:553] The assert_cardinality transformation is currently not handled by the auto-shard rewrite and will be removed. [INFO|trainer_tf.py:313] 2024-05-13 12:20:33,380 >> Running Evaluation [INFO|trainer_tf.py:314] 2024-05-13 12:20:33,381 >> Num examples in dataset = 1161 [INFO|trainer_tf.py:316] 2024-05-13 12:20:33,381 >> Num examples in used in evaluation = 1280 [INFO|trainer_tf.py:317] 2024-05-13 12:20:33,381 >> Batch size = 128 Traceback (most recent call last): File "/home/workspace/novoai/ground0/biored/src/run_biored_exp.py", line 795, in main() File "/home/workspace/novoai/ground0/biored/src/run_biored_exp.py", line 749, in main learner.train(training_args.output_dir) File "/home/workspace/novoai/ground0/biored/src/tf_wrapper.py", line 241, in train result = self.evaluate() File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/transformers/trainer_tf.py", line 433, in evaluate output = self.prediction_loop(eval_ds, steps, num_examples, description="Evaluation") File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/transformers/trainer_tf.py", line 321, in prediction_loop self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states' cp: cannot stat 'out_model_biored_novelty/test_results.tsv': No such file or directory Traceback (most recent call last): File "/home/workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 910, in dump_pred_2_pubtator_file(in_test_pubtator_file = in_test_pubtator_file, File "/home/workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 186, in dump_pred_2_pubtator_file pmid_2_rel_pairs_dict = add_relation_pairs_dict( File "/home//workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 61, in add_relation_pairs_dict testdf = pd.read_csv(in_gold_tsv_file, sep="\t", index_col=0) File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1026, in read_csv return _read(filepath_or_buffer, kwds) File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 620, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1620, in init self._engine = self._make_engine(f, self.engine) File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1880, in _make_engine self.handles = get_handle( File "/home/miniconda3/envs/biored_re/lib/python3.9/site-packages/pandas/io/common.py", line 873, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '' datasets/biored/BioRED/Test.PubTator biored_pred_mul.txt Traceback (most recent call last): File "/home/workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 899, in run_pubtator_eval( File "/home/workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 730, in run_pubtator_eval eval(in_gold_pubtator_file, File "/home/workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 548, in eval pred_relation_pairsdict, = retrive_relation_pairs_dict( File "/home/workspace/novoai/ground0/biored/src/utils/run_biored_eval.py", line 432, in retrive_relation_pairs_dict with open(in_pubtator_file, 'r', encoding='utf8') as pub_reader: FileNotFoundError: [Errno 2] No such file or directory: 'biored_pred_mul.txt'

bash scripts/run_test_pred.sh 0 Converting the dataset into BioRED-RE input format 2024-05-13 12:34:16.485165: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-13 12:34:17.174354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT =======>len(all_documents) 100 Generating RE and novelty predictions 2024-05-13 12:34:42.915764: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-13 12:34:43.535466: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [INFO|training_args.py:804] 2024-05-13 12:34:45,298 >> using logging_steps to initialize eval_steps to 10 [INFO|training_args.py:1023] 2024-05-13 12:34:45,298 >> PyTorch: setting up devices [INFO|training_args.py:885] 2024-05-13 12:34:45,686 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [INFO|training_args_tf.py:189] 2024-05-13 12:34:45,688 >> Tensorflow: setting up strategy 2024-05-13 12:34:46.375953: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 79078 MB memory: -> device: 0, name: NVIDIA A100 80GB PCIe, pci bus id: 0001:00:00.0, compute capability: 8.0 2024-05-13 12:34:46.377753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 79078 MB memory: -> device: 1, name: NVIDIA A100 80GB PCIe, pci bus id: 0002:00:00.0, compute capability: 8.0 2024-05-13 12:34:46.379460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 79078 MB memory: -> device: 2, name: NVIDIA A100 80GB PCIe, pci bus id: 0003:00:00.0, compute capability: 8.0 2024-05-13 12:34:46.380943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1928] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 79078 MB memory: -> device: 3, name: NVIDIA A100 80GB PCIe, pci bus id: 0004:00:00.0, compute capability: 8.0 INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3') 05/13/2024 12:34:46 - INFO - tensorflow - Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3') 05/13/2024 12:34:47 - INFO - main - n_replicas: 4, distributed training: True, 16-bits training: False 05/13/2024 12:34:47 - INFO - main - Training/evaluation parameters TFTrainingArguments( _n_gpu=4, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=10, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gcp_project=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=1e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=out_model_biored_all_mul/runs/May13_12-34-45_VMNUVOCINPOC, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=10.0, optim=OptimizerNames.ADAMW_HF, output_dir=out_model_biored_all_mul, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=16, poly_power=1.0, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report

ptlai commented 3 months ago

Hi, The error is caused by "self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states'" It looks like a Transformers package version's problem. Do you use the same environment settings as those I provided?

Khyati-Microcrispr commented 3 months ago

Hi, I downgraded the transformer version to 4.18.0 because TFtrainer was not getting imported. I tried all way to import it but it only supports older versions.

On Wed, 15 May 2024 at 10:24 AM, Po-Ting Lai @.***> wrote:

Hi, The error is caused by "self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states'" It looks like a Transformers package version's problem. Do you use the same environment settings as those I provided?

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2111581301, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYRWI6PEPVIDBWIMB4DZCLTAZAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGU4DCMZQGE . You are receiving this because you were mentioned.Message ID: @.***>

Khyati-Microcrispr commented 3 months ago

Hi, could you please run the prediction script and check if it's working?, I have tried everything I can but it's not working, please provide an input file for prediction task. Thanks

On Wed, 15 May 2024 at 10:27, Khyati Patni @.***> wrote:

Hi, I downgraded the transformer version to 4.18.0 because TFtrainer was not getting imported. I tried all way to import it but it only supports older versions.

On Wed, 15 May 2024 at 10:24 AM, Po-Ting Lai @.***> wrote:

Hi, The error is caused by "self.eval_loss.reset_states() AttributeError: 'Sum' object has no attribute 'reset_states'" It looks like a Transformers package version's problem. Do you use the same environment settings as those I provided?

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2111581301, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYRWI6PEPVIDBWIMB4DZCLTAZAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGU4DCMZQGE . You are receiving this because you were mentioned.Message ID: @.***>

ptlai commented 3 months ago

Hi, I will try the setting again and provide you with an update afterward.

Khyati-Microcrispr commented 3 months ago

Ok, thanks for the update.

On Tue, 21 May 2024 at 8:07 PM, Po-Ting Lai @.***> wrote:

Hi, I will try the setting again and provide you with an update afterward.

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2122788034, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYRFONMNOGYIUIX6RFLZDNL4DAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRSG44DQMBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

ptlai commented 3 months ago

Hi @Khyati-Microcrispr ,

I have tried it again, and I found errors on the prediction part. However, the training stage runs well as below:

Here are my steps for reproducing the results.

1. Environment:

OS: Win11 + WSL2 (Ubuntu 22.04.2 LTS) GPU: RTX 3080

2. Setting up

conda create -n py39 python=3.9
conda activate py39
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
python.exe -m pip install --upgrade pip
python -m pip install "tensorflow==2.10"

Then you can run the below Python script to check whether you can access GPU.

import tensorflow as tf
print(tf.__version__)
print(len(tf.config.list_physical_devices('GPU')))
print(tf.test.is_built_with_cuda())
print(tf.test.is_gpu_available())

build_info = tf.sysconfig.get_build_info()
cuda_version = build_info["cuda_version"]
cudnn_version = build_info["cudnn_version"]
print("CUDA version TensorFlow was built with:", cuda_version)
print("cuDNN version TensorFlow was built with:", cudnn_version)

Install requirements pip install -r requirements.txt Here is my requirements.txt

transformers == 4.18.0
accelerate == 0.9.0
pandas == 1.1.5
numpy == 1.20.0
datasets == 2.3.2
sentencepiece != 0.1.92
protobuf == 3.19.4
scispacy == 0.2.4
tensorflow == 2.9.3
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

3. Running the script

I found there are two missing parameters in scripts/run_biored_exp.sh, which should be modified as the below.

python src/utils/run_biored_eval.py --exp_option 'to_pubtator' \
    --in_pred_rel_tsv_file "out_biored_all_mul_test_results.tsv" \
    --in_pred_novelty_tsv_file "out_biored_novelty_test_results.tsv" \
    --in_test_tsv_file "datasets/biored/processed/test.tsv" \
    --in_test_pubtator_file "datasets/biored/BioRED/Test.PubTator" \
    --out_pred_pubtator_file "biored_pred_mul.txt"

In my original version, --in_test_tsv_file and --in_test_pubtator_file were missing. After fixing it, you can run the below command and get the result.

bash scripts/build_biored_dataset.sh
bash scripts/run_biored_exp.sh

Khyati-Microcrispr commented 3 months ago

Hi, Thank you so much for the update, both training and prediction scripts finally worked.

On Wed, 22 May 2024 at 19:07, Po-Ting Lai @.***> wrote:

Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr ,

I have tried it again, and I found errors on the prediction part. However, the training stage runs well as below: image.png (view on web) https://github.com/ncbi/BioRED/assets/61985809/95cb486d-bb6d-4230-80b8-f5e684a4e7d2

Here are my steps for reproducing the results.

Environment:

OS: Win11 + WSL2 (Ubuntu 22.04.2 LTS) GPU: RTX 3080

Setting up

conda create -n py39 python=3.9 conda activate py39 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 python.exe -m pip install --upgrade pip python -m pip install "tensorflow==2.10"

Then you can run the below Python script to check whether you can access GPU.

import tensorflow as tf print(tf.version) print(len(tf.config.list_physical_devices('GPU'))) print(tf.test.is_built_with_cuda()) print(tf.test.is_gpu_available())

build_info = tf.sysconfig.get_build_info() cuda_version = build_info["cuda_version"] cudnn_version = build_info["cudnn_version"] print("CUDA version TensorFlow was built with:", cuda_version) print("cuDNN version TensorFlow was built with:", cudnn_version)

Install requirements pip install -r requirements.txt Here is my requirements.txt

transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

Running the script

I found there are two missing parameters in scripts/run_biored_exp.sh, which should be modified as the below.

python src/utils/run_biored_eval.py --exp_option 'to_pubtator' \ --in_pred_rel_tsv_file "out_biored_all_mul_test_results.tsv" \ --in_pred_novelty_tsv_file "out_biored_novelty_test_results.tsv" \ --in_test_tsv_file "datasets/biored/processed/test.tsv" \ --in_test_pubtator_file "datasets/biored/BioRED/Test.PubTator" \ --out_pred_pubtator_file "biored_pred_mul.txt"

In my original version, --in_test_tsv_file and --in_test_pubtator_file were missing. After fixing it, you can run the below command and get the result.

bash scripts/build_biored_dataset.sh bash scripts/run_biored_exp.sh

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/6#issuecomment-2124820598, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYWZR27J55KGOKKBS7DZDSNQDAVCNFSM6AAAAABHDS3CGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRUHAZDANJZHA . You are receiving this because you were mentioned.Message ID: @.***>

ncbi / BioRED