ncbi / BioREx

25 stars 9 forks source link

ImportError: cannot import name 'TFTrainer' from 'transformers' and cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory #7

Closed Khyati-Microcrispr closed 3 months ago

Khyati-Microcrispr commented 3 months ago

Hi, I am getting this error, Please check the code from your side if some files are missing while prediction and training. please help me out to solve this issue. Thank you

I have used this environment setting as given to me for BIORED (Ubuntu 22.04.2 LTS) GPU: RTX 3040

  1. Setting up conda create -n py39 python=3.9 conda activate py39 export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/:$CUDNN_PATH/lib conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0 python.exe -m pip install --upgrade pip python -m pip install "tensorflow==2.10" Then you can run the below Python script to check whether you can access GPU.

import tensorflow as tf print(tf.version) print(len(tf.config.list_physical_devices('GPU'))) print(tf.test.is_built_with_cuda()) print(tf.test.is_gpu_available())

build_info = tf.sysconfig.get_build_info() cuda_version = build_info["cuda_version"] cudnn_version = build_info["cudnn_version"] print("CUDA version TensorFlow was built with:", cuda_version) print("cuDNN version TensorFlow was built with:", cudnn_version) Install requirements pip install -r requirements.txt Here is my requirements.txt

transformers == 4.18.0 accelerate == 0.9.0 pandas == 1.1.5 numpy == 1.20.0 datasets == 2.3.2 sentencepiece != 0.1.92 protobuf == 3.19.4 scispacy == 0.2.4 tensorflow == 2.9.3 https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_md-0.5.1.tar.gz

error bash scripts/run_test_pred.sh 0 bash: /home/khyati/anaconda3/envs/biorex/lib/libtinfo.so.6: no version information available (required by bash) Converting the dataset into BioREx input format 2024-05-27 12:33:24.619028: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-27 12:33:24.800147: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-27 12:33:24.865322: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-27 12:33:25.520537: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib 2024-05-27 12:33:25.520634: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib 2024-05-27 12:33:25.520643: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. number_unique_YES_instances 0 Generating RE predictions 2024-05-27 12:33:28.647477: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-27 12:33:28.822023: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-27 12:33:28.870588: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-27 12:33:29.678845: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib 2024-05-27 12:33:29.678988: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib 2024-05-27 12:33:29.679003: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. [INFO|training_args.py:804] 2024-05-27 12:33:32,063 >> using logging_steps to initialize eval_steps to 10 [INFO|training_args.py:1023] 2024-05-27 12:33:32,063 >> PyTorch: setting up devices [INFO|training_args.py:885] 2024-05-27 12:33:32,092 >> The default value for the training argument --report_to will change in v5 (from all installed integrations to none). In v5, you will need to use --report_to all to get the same behavior as now. You should start updating your code and make this info disappear :-). [INFO|training_args_tf.py:189] 2024-05-27 12:33:32,093 >> Tensorflow: setting up strategy 2024-05-27 12:33:32.101113: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-27 12:33:32.743930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1616] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21219 MB memory: -> device: 0, name: NVIDIA GeForce RTX 4090, pci bus id: 0000:31:00.0, compute capability: 8.9 05/27/2024 12:33:32 - INFO - main - n_replicas: 1, distributed training: False, 16-bits training: False 05/27/2024 12:33:32 - INFO - main - Training/evaluation parameters TFTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_pin_memory=True, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, debug=[], deepspeed=None, disable_tqdm=False, do_eval=True, do_predict=True, do_train=False, eval_accumulation_steps=None, eval_delay=0, eval_steps=10, evaluation_strategy=IntervalStrategy.STEPS, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, gcp_project=None, gradient_accumulation_steps=1, gradient_checkpointing=False, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_model_id=None, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=-1, log_level=-1, log_level_replica=-1, log_on_each_node=True, logging_dir=biorex_model/runs/May27_12-33-32_microcrispr7, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_type=SchedulerType.LINEAR, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, no_cuda=False, num_train_epochs=10.0, optim=OptimizerNames.ADAMW_HF, output_dir=biorex_model, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=32, per_device_train_batch_size=16, poly_power=1.0, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=biorex_model, save_on_each_node=False, save_steps=10, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, sharded_ddp=[], skip_memory_metrics=True, tf32=None, tpu_metrics_debug=False, tpu_name=None, tpu_num_cores=None, tpu_zone=None, use_legacy_prediction_loop=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, xla=False, xpu_backend=None, ) Traceback (most recent call last): File "/Khyati/BioREx-main/src/run_ncbi_rel_exp.py", line 884, in main() File "/Khyati/BioREx-main/src/run_ncbi_rel_exp.py", line 606, in main tokenizer = AutoTokenizer.from_pretrained( File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 471, in from_pretrained tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs) File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 332, in get_tokenizer_config resolved_config_file = get_file_from_repo( File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 678, in get_file_from_repo resolved_file = cached_path( File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 282, in cached_path output_path = get_from_cache( File "/home/khyati/anaconda3/envs/py39/lib/python3.9/site-packages/transformers/utils/hub.py", line 545, in get_from_cache raise ValueError( ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory 2024-05-27 12:33:35.580800: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-05-27 12:33:35.691247: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0. 2024-05-27 12:33:35.720913: E tensorflow/stream_executor/cuda/cuda_blas.cc:2981] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-05-27 12:33:36.168418: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib 2024-05-27 12:33:36.168508: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/home/khyati/anaconda3/envs/biorex/lib/:/lib 2024-05-27 12:33:36.168517: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Khyati-Microcrispr commented 3 months ago

It got solved