ncbi / BioREx

25 stars 9 forks source link

cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory #1

Closed stardust-xc closed 9 months ago

stardust-xc commented 1 year ago

cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory

how to solve this problem? i meet this error when i run the model.I need help!thanks!

ptlai commented 1 year ago

Hi @stardust-xc ,

It seems "src/run_ncbi_rel_exp.py" failed when executed in run_biorex_exp.sh, resulting in no biorex_model/test_results.tsv. Could you share more details on the error? Thanks!

Po-Ting

stardust-xc commented 1 year ago

“ImportError: /root/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: cudaGraphDebugDotPrint, version libcudart.so.11.0” I reran the model and found that the top of this error is an import error, which should be the cause of the error reported below, how should this problem be solved? “ImportError: /root/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: cudaGraphDebugDotPrint, version libcudart.so.11.0”

ptlai commented 1 year ago

Hi @stardust-xc ,

Po-Ting

stardust-xc commented 1 year ago

Thank you for your suggestion. I have now successfully run this model with the provided data. Now I just want to train with my own data, but I didn't quite understand how to establish and train my own data. Can you provide a more detailed step? I have data now, but I don't know how to convert it.

ptlai commented 1 year ago

Hi @stardust-xc,

To convert your data, please refer to our dataset format converter available at: convert_pubtator_2_tsv.py.

Ensure your data is in the pubtator file format. For guidance on using the converter, check the main() function in the aforementioned script; it showcases how various datasets can be transformed into the BioREx format.

Once converted, you can merge your dataset with the one you've downloaded from this link.

Upon preparing your merged dataset, modifications in run_ncbi_rel_exp.py will be required. Specifically, update the get_labels(self), get_entity_type_dict(cls), and get_special_tags(cls) methods to incorporate your new relation type labels, entity type labels, and the dataset name tag (which you'll define in convert_pubtator_2_tsv.py).

If you need any assistance, please let me know. Thanks.

Po-Ting

stardust-xc commented 1 year ago

Thank you for your reply, but my current idea may just be to use the results of this model to do inference, not to re-alter the data to train this model, that is, I want to directly use this model to help me to annotate my text (that is, the data I have now), and then based on the results to finally do something similar to the medical search aspect, can you give me some suggestions on how to use it?

ptlai commented 1 year ago

Hi @stardust-xc ,

If you wish to use our model to predict new data, please refer to https://github.com/ncbi/BioREx#predicting-new-data. A script has been incorporated to facilitate the prediction of relations. Please feel free to reach out if you have any further questions.

Po-Ting

stardust-xc commented 1 year ago

Hello, I am now getting this error when doing data forecasting: Converting the dataset into BioREx input format Generating RE predictions Traceback (most recent call last): File "/root/BioREx_old/BioREx/src/run_ncbi_rel_exp.py", line 25, in import datasets ModuleNotFoundError: No module named 'datasets' cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory Usage: run_pubtator_eval.py [options]

run_pubtator_eval.py: error: no such option: --in_test_pubtator_file

But actually before doing this, I have already done the previous operations, the datasets are already in the folder where I am running now, and the input file is also in the model's folder, but still I get this error as soon as I run the prediction program, so I hope that you can give me some suggestions so that the prediction script can run properly!

ptlai commented 1 year ago

Hi @stardust-xc ,

In the error message, 'datasets' is not a folder in the BioREx project. It is a Python package https://pypi.org/project/datasets/. Could you please verify if the datasets package is present in your library directory? For example: /root/miniconda3/envs//lib? If it's not available, you can install it using the following command: pip install datasets==2.3.2

stardust-xc commented 1 year ago

Sad to see a new error reported, what could be the cause of this one? Is there any solution?

. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new tf.data.Options() object then setting options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA before applying the options object to the dataset via dataset.with_options(options). Traceback (most recent call last): File "/root/BioREx_old/BioREx/src/run_ncbi_rel_exp.py", line 884, in main() File "/root/BioREx_old/BioREx/src/run_ncbi_rel_exp.py", line 871, in main predictions = model.predict(batch_test_dataset)["logits"] File "/root/miniconda3/envs/biorex/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "/root/miniconda3/envs/biorex/lib/python3.9/site-packages/keras/engine/training.py", line 2048, in predict raise ValueError('Unexpected result of predict_function ' ValueError: Unexpected result of predict_function (Empty batch_outputs). Please use Model.compile(..., run_eagerly=True), or tf.config.run_functions_eagerly(True) for more information of where went wrong, or file a issue/bug to tf.keras. cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory Usage: run_pubtator_eval.py [options]

run_pubtator_eval.py: error: no such option: --in_test_pubtator_file

stardust-xc commented 1 year ago

Hello, I want to know what is biorex_model/test_results.tsv and where from?

ptlai commented 1 year ago

Hi @stardust-xc ,

I tested the code on another computer using the Windows Subsystem for Linux and was unable to reproduce the error you mentioned.

I'm wondering if the issue could be related to specific package dependencies on your end. Could you provide details on your installed Python packages and their versions, as well as your OS version? I've updated the requirements.txt to address potential issues with the scispacy installation.

The error message "run_pubtator_eval.py: error: no such option: --in_test_pubtator_file" suggests that you might not be using the latest version of run_pubtator_eval.py. Please check and compare with this version.

Regarding biorex_model/test_results.tsv: It's an output file produced by run_ncbi_rel_exp.py. This file contains prediction scores for each relation label. These scores represent values prior to softmax computation. The purpose of this intermediate file is to generate the final output in pubtator format.

stardust-xc commented 1 year ago

Hi, here's the info on my OS: Linux autodl-container-98b3119d3c-8e6e5881 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux NAME="Ubuntu" VERSION="20.04.4 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.4 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

The python package and version information is as follows: absl-py==1.4.0 accelerate==0.23.0 aiohttp==3.8.5 aiosignal==1.3.1 annotated-types==0.5.0 astunparse==1.6.3 async-timeout==4.0.3 attrs==23.1.0 blis==0.7.10 brotlipy==0.7.0 cachetools==5.3.1 catalogue==2.0.9 cchardet @ file:///home/conda/feedstock_root/build_artifacts/cchardet_1636139719885/work certifi==2023.7.22 cffi @ file:///croot/cffi_1670423208954/work chardet @ file:///home/conda/feedstock_root/build_artifacts/chardet_1692221558316/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work click==8.1.7 cmake==3.27.4.1 confection==0.1.3 conllu==4.5.3 cryptography @ file:///croot/cryptography_1694444244250/work cymem==2.0.7 datasets==2.3.2 dill==0.3.5.1 en-core-sci-md @ file:///root/BioREx_old/BioREx/en_core_sci_md-0.5.1.tar.gz#sha256=4485c7c3f2522eccf53e44069223d1162258de48fbad457419b38808b3a0ed20 filelock==3.12.4 flatbuffers==23.5.26 frozenlist==1.4.0 fsspec==2023.9.0 gast==0.4.0 gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645438755360/work google-auth==2.23.0 google-auth-oauthlib==1.0.0 google-pasta==0.2.0 grpcio==1.58.0 h5py==3.9.0 huggingface-hub==0.17.1 idna @ file:///croot/idna_1666125576474/work importlib-metadata==6.8.0 Jinja2 @ file:///croot/jinja2_1666908132255/work joblib==1.3.2 keras==2.13.1 Keras-Preprocessing==1.1.2 langcodes==3.3.0 libclang==16.0.6 lit==16.0.6 Markdown==3.4.4 MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work mkl-fft==1.3.6 mkl-random @ file:///work/mkl/mkl_random_1682950433854/work mkl-service==2.4.0 mpmath @ file:///croot/mpmath_1690848262763/work multidict==6.0.4 multiprocess==0.70.13 murmurhash==1.0.9 networkx @ file:///croot/networkx_1690561992265/work nmslib==2.1.1 numpy==1.24.3 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oauthlib==3.2.2 opt-einsum==3.3.0 packaging==23.1 pandas==2.1.0 pathy==0.10.2 Pillow==9.4.0 preshed==3.0.8 protobuf==4.24.3 psutil==5.9.5 pyarrow==13.0.0 pyasn1==0.5.0 pyasn1-modules==0.3.0 pybind11==2.6.1 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pydantic==1.10.12 pydantic_core==2.6.3 pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work pysbd==0.3.4 PySocks @ file:///tmp/build/80754af9/pysocks_1605305812635/work python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 regex==2023.8.8 requests @ file:///croot/requests_1690400202158/work requests-oauthlib==1.3.1 responses==0.18.0 rsa==4.9 sacremoses==0.0.53 scikit-learn==1.3.0 scipy==1.11.2 scispacy==0.5.2 sentencepiece==0.1.99 six==1.16.0 smart-open==6.4.0 spacy==3.4.4 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.4.7 sympy @ file:///croot/sympy_1668202399572/work tensorboard==2.13.0 tensorboard-data-server==0.7.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.13.0 tensorflow-estimator==2.13.0 tensorflow-gpu==2.9.3 tensorflow-io-gcs-filesystem==0.34.0 termcolor==2.3.0 thinc==8.1.12 threadpoolctl==3.2.0 tokenizers==0.12.1 torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2 tqdm==4.66.1 transformers==4.18.0 triton==2.0.0 typer==0.7.0 typing_extensions==4.5.0 tzdata==2023.3 urllib3 @ file:///croot/urllib3_1686163155763/work wasabi==0.10.1 Werkzeug==2.3.7 wrapt==1.15.0 xxhash==3.3.0 yarl==1.9.2 zipp==3.16.2

I tried continuing the run today, but the error reporting still stopped where it was yesterday, i.e. there was no file biorex_model/test_results.tsv, so I chose to go and run the file run_ncbi_rel_exp.py directly, and found that indeed that was the file that had never been run, and that it was reporting errors when I ran it now, and it was all are similar indentation or notation problems (I have not changed this file) e.g.: src/run_ncbi_rel_exp.py: line 43: syntax error near unexpected token (' src/run_ncbi_rel_exp.py: line 43:def set_seeds(seed):' Maybe the problem is that this py file is not running, which prevents the generation of biorex_model/test_results.tsv, hopefully I can get some solution.

ptlai commented 1 year ago

Hello @stardust-xc,

Thank you for sharing your configuration details. I've noticed many differences between the packages you have and those listed at https://github.com/ncbi/BioREx/blob/main/requirements.txt.

It appears that some pre-existing packages on your server may had prevented some packages from downgrading. Version mismatches can sometimes lead to unexpected errors. Here are potential solutions:

  1. Setting Up a Virtual Environment: Create a new virtual environment and install the necessary packages there. This can isolate any potential conflicts. You can achieve this with the following commands:
conda create -n biorex python=3.9
conda activate biorex
pip install -r requirements.txt
  1. TensorFlow Version Mismatch: I observed that you have two different TensorFlow versions:
    tensorflow==2.13.0
    tensorflow-gpu==2.9.3

    Previously, my requirements suggested tensorflow-gpu==2.9.3, but I've since updated it to recommend tensorflow>=2.9.3. Having multiple versions can confuse the system. I'd recommend uninstalling tensorflow-gpu and downgrading tensorflow to 2.9.3.

Lastly, would you be able to provide a complete log or screenshot of the messages displayed when running our code? This will help me identify any other potential issues.

Thank you!

pyramid20002000 commented 1 year ago

Hello, all

I just meet exactly the same problem. And I tracked the problem and but has no conclusion. What I can tell so far is that the problem is indeed in convert_pubtator_2_tsv.py.

What happens is: We need to convert the pubtator file into tsv file before use any pretrained models (bioRed models or bioRex models). But, the tsv file generated is easily empty .... which results a failure in generating 'biorex_model/test_results.tsv', and of course if you looked at the last line of error message, you find "cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory".

But the real source of problem is the "out_processed.tsv", you can check it out. In my case, it is empty.

I cannot further track the problem, as convert_pubtator_2_tsv.py is very complicated script and poorly commented. So I do not know the purpose of each step. But there is one indication: "number_unique_YES_instances 0" It seems that the program found 0 "Yes" instances in pubtator file.

Fortunately, the convert_pubtator_2_tsv.py did work with "Test.pubtator" file in https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/BioRED.zip So I was able to compare the pubtator file that can produce a correct out_processed.tsv with the one that produce empty out_processed.tsv, only to find that if pubtator file contained "relations", the prediction worked properly........

And you know how wired it is ...... why should I extract relations if I already have them in file ........

I also reported this problem in BioRED project : https://github.com/ncbi/BioRED/issues/5 As the behavior is quite the same.......

@ptlai

ptlai commented 1 year ago

We have resolved the problem through email communication. @pyramid20002000

pyramid20002000 commented 1 year ago

Yes, my problem is resolved and solution is updated in link below : https://github.com/ncbi/BioRED/issues/5