Closed stardust-xc closed 9 months ago
Hi @stardust-xc ,
It seems "src/run_ncbi_rel_exp.py" failed when executed in run_biorex_exp.sh, resulting in no biorex_model/test_results.tsv. Could you share more details on the error? Thanks!
Po-Ting
“ImportError: /root/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: cudaGraphDebugDotPrint, version libcudart.so.11.0” I reran the model and found that the top of this error is an import error, which should be the cause of the error reported below, how should this problem be solved? “ImportError: /root/miniconda3/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so: undefined symbol: cudaGraphDebugDotPrint, version libcudart.so.11.0”
Hi @stardust-xc ,
Could you kindly verify whether the 'libcudart' library is present in your environment at the following path:
-- /root/miniconda3/envs/
If it's not present, please proceed with the installation using either of the following commands: -- conda install cudatoolkit=11.0 or conda install -c conda-forge cudatoolkit=11.0
Once the installation is complete, please double-check the presence of 'libcudart' at the path mentioned above. If it's still not there, you can also look for it in: -- /root/miniconda3/lib
If necessary, make sure to set up the PATH and LD_LIBRARY_PATH variables appropriately.
Po-Ting
Thank you for your suggestion. I have now successfully run this model with the provided data. Now I just want to train with my own data, but I didn't quite understand how to establish and train my own data. Can you provide a more detailed step? I have data now, but I don't know how to convert it.
Hi @stardust-xc,
To convert your data, please refer to our dataset format converter available at: convert_pubtator_2_tsv.py.
Ensure your data is in the pubtator file format. For guidance on using the converter, check the main()
function in the aforementioned script; it showcases how various datasets can be transformed into the BioREx format.
Once converted, you can merge your dataset with the one you've downloaded from this link.
Upon preparing your merged dataset, modifications in run_ncbi_rel_exp.py
will be required. Specifically, update the get_labels(self)
, get_entity_type_dict(cls)
, and get_special_tags(cls)
methods to incorporate your new relation type labels, entity type labels, and the dataset name tag (which you'll define in convert_pubtator_2_tsv.py
).
If you need any assistance, please let me know. Thanks.
Po-Ting
Thank you for your reply, but my current idea may just be to use the results of this model to do inference, not to re-alter the data to train this model, that is, I want to directly use this model to help me to annotate my text (that is, the data I have now), and then based on the results to finally do something similar to the medical search aspect, can you give me some suggestions on how to use it?
Hi @stardust-xc ,
If you wish to use our model to predict new data, please refer to https://github.com/ncbi/BioREx#predicting-new-data. A script has been incorporated to facilitate the prediction of relations. Please feel free to reach out if you have any further questions.
Po-Ting
Hello, I am now getting this error when doing data forecasting:
Converting the dataset into BioREx input format
Generating RE predictions
Traceback (most recent call last):
File "/root/BioREx_old/BioREx/src/run_ncbi_rel_exp.py", line 25, in
run_pubtator_eval.py: error: no such option: --in_test_pubtator_file
But actually before doing this, I have already done the previous operations, the datasets are already in the folder where I am running now, and the input file is also in the model's folder, but still I get this error as soon as I run the prediction program, so I hope that you can give me some suggestions so that the prediction script can run properly!
Hi @stardust-xc ,
In the error message, 'datasets' is not a folder in the BioREx project. It is a Python package https://pypi.org/project/datasets/.
Could you please verify if the datasets package is present in your library directory? For example: /root/miniconda3/envs/pip install datasets==2.3.2
Sad to see a new error reported, what could be the cause of this one? Is there any solution?
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new tf.data.Options()
object then setting options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA
before applying the options object to the dataset via dataset.with_options(options)
.
Traceback (most recent call last):
File "/root/BioREx_old/BioREx/src/run_ncbi_rel_exp.py", line 884, in predict_function
'
ValueError: Unexpected result of predict_function
(Empty batch_outputs). Please use Model.compile(..., run_eagerly=True)
, or tf.config.run_functions_eagerly(True)
for more information of where went wrong, or file a issue/bug to tf.keras
.
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
Usage: run_pubtator_eval.py [options]
run_pubtator_eval.py: error: no such option: --in_test_pubtator_file
Hello, I want to know what is biorex_model/test_results.tsv
and where from?
Hi @stardust-xc ,
I tested the code on another computer using the Windows Subsystem for Linux and was unable to reproduce the error you mentioned.
I'm wondering if the issue could be related to specific package dependencies on your end. Could you provide details on your installed Python packages and their versions, as well as your OS version? I've updated the requirements.txt to address potential issues with the scispacy installation.
The error message "run_pubtator_eval.py: error: no such option: --in_test_pubtator_file" suggests that you might not be using the latest version of run_pubtator_eval.py. Please check and compare with this version.
Regarding biorex_model/test_results.tsv: It's an output file produced by run_ncbi_rel_exp.py. This file contains prediction scores for each relation label. These scores represent values prior to softmax computation. The purpose of this intermediate file is to generate the final output in pubtator format.
Hi, here's the info on my OS: Linux autodl-container-98b3119d3c-8e6e5881 5.4.0-91-generic #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux NAME="Ubuntu" VERSION="20.04.4 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.4 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal
The python package and version information is as follows: absl-py==1.4.0 accelerate==0.23.0 aiohttp==3.8.5 aiosignal==1.3.1 annotated-types==0.5.0 astunparse==1.6.3 async-timeout==4.0.3 attrs==23.1.0 blis==0.7.10 brotlipy==0.7.0 cachetools==5.3.1 catalogue==2.0.9 cchardet @ file:///home/conda/feedstock_root/build_artifacts/cchardet_1636139719885/work certifi==2023.7.22 cffi @ file:///croot/cffi_1670423208954/work chardet @ file:///home/conda/feedstock_root/build_artifacts/chardet_1692221558316/work charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work click==8.1.7 cmake==3.27.4.1 confection==0.1.3 conllu==4.5.3 cryptography @ file:///croot/cryptography_1694444244250/work cymem==2.0.7 datasets==2.3.2 dill==0.3.5.1 en-core-sci-md @ file:///root/BioREx_old/BioREx/en_core_sci_md-0.5.1.tar.gz#sha256=4485c7c3f2522eccf53e44069223d1162258de48fbad457419b38808b3a0ed20 filelock==3.12.4 flatbuffers==23.5.26 frozenlist==1.4.0 fsspec==2023.9.0 gast==0.4.0 gmpy2 @ file:///tmp/build/80754af9/gmpy2_1645438755360/work google-auth==2.23.0 google-auth-oauthlib==1.0.0 google-pasta==0.2.0 grpcio==1.58.0 h5py==3.9.0 huggingface-hub==0.17.1 idna @ file:///croot/idna_1666125576474/work importlib-metadata==6.8.0 Jinja2 @ file:///croot/jinja2_1666908132255/work joblib==1.3.2 keras==2.13.1 Keras-Preprocessing==1.1.2 langcodes==3.3.0 libclang==16.0.6 lit==16.0.6 Markdown==3.4.4 MarkupSafe @ file:///opt/conda/conda-bld/markupsafe_1654597864307/work mkl-fft==1.3.6 mkl-random @ file:///work/mkl/mkl_random_1682950433854/work mkl-service==2.4.0 mpmath @ file:///croot/mpmath_1690848262763/work multidict==6.0.4 multiprocess==0.70.13 murmurhash==1.0.9 networkx @ file:///croot/networkx_1690561992265/work nmslib==2.1.1 numpy==1.24.3 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oauthlib==3.2.2 opt-einsum==3.3.0 packaging==23.1 pandas==2.1.0 pathy==0.10.2 Pillow==9.4.0 preshed==3.0.8 protobuf==4.24.3 psutil==5.9.5 pyarrow==13.0.0 pyasn1==0.5.0 pyasn1-modules==0.3.0 pybind11==2.6.1 pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work pydantic==1.10.12 pydantic_core==2.6.3 pyOpenSSL @ file:///croot/pyopenssl_1690223430423/work pysbd==0.3.4 PySocks @ file:///tmp/build/80754af9/pysocks_1605305812635/work python-dateutil==2.8.2 pytz==2023.3.post1 PyYAML==6.0.1 regex==2023.8.8 requests @ file:///croot/requests_1690400202158/work requests-oauthlib==1.3.1 responses==0.18.0 rsa==4.9 sacremoses==0.0.53 scikit-learn==1.3.0 scipy==1.11.2 scispacy==0.5.2 sentencepiece==0.1.99 six==1.16.0 smart-open==6.4.0 spacy==3.4.4 spacy-legacy==3.0.12 spacy-loggers==1.0.5 srsly==2.4.7 sympy @ file:///croot/sympy_1668202399572/work tensorboard==2.13.0 tensorboard-data-server==0.7.1 tensorboard-plugin-wit==1.8.1 tensorflow==2.13.0 tensorflow-estimator==2.13.0 tensorflow-gpu==2.9.3 tensorflow-io-gcs-filesystem==0.34.0 termcolor==2.3.0 thinc==8.1.12 threadpoolctl==3.2.0 tokenizers==0.12.1 torch==2.0.1 torchaudio==2.0.2 torchvision==0.15.2 tqdm==4.66.1 transformers==4.18.0 triton==2.0.0 typer==0.7.0 typing_extensions==4.5.0 tzdata==2023.3 urllib3 @ file:///croot/urllib3_1686163155763/work wasabi==0.10.1 Werkzeug==2.3.7 wrapt==1.15.0 xxhash==3.3.0 yarl==1.9.2 zipp==3.16.2
I tried continuing the run today, but the error reporting still stopped where it was yesterday, i.e. there was no file biorex_model/test_results.tsv, so I chose to go and run the file run_ncbi_rel_exp.py directly, and found that indeed that was the file that had never been run, and that it was reporting errors when I ran it now, and it was all are similar indentation or notation problems (I have not changed this file) e.g.:
src/run_ncbi_rel_exp.py: line 43: syntax error near unexpected token (' src/run_ncbi_rel_exp.py: line 43:
def set_seeds(seed):'
Maybe the problem is that this py file is not running, which prevents the generation of biorex_model/test_results.tsv, hopefully I can get some solution.
Hello @stardust-xc,
Thank you for sharing your configuration details. I've noticed many differences between the packages you have and those listed at https://github.com/ncbi/BioREx/blob/main/requirements.txt.
It appears that some pre-existing packages on your server may had prevented some packages from downgrading. Version mismatches can sometimes lead to unexpected errors. Here are potential solutions:
conda create -n biorex python=3.9
conda activate biorex
pip install -r requirements.txt
tensorflow==2.13.0
tensorflow-gpu==2.9.3
Previously, my requirements suggested tensorflow-gpu==2.9.3
, but I've since updated it to recommend tensorflow>=2.9.3
. Having multiple versions can confuse the system. I'd recommend uninstalling tensorflow-gpu
and downgrading tensorflow
to 2.9.3
.
Lastly, would you be able to provide a complete log or screenshot of the messages displayed when running our code? This will help me identify any other potential issues.
Thank you!
Hello, all
I just meet exactly the same problem. And I tracked the problem and but has no conclusion. What I can tell so far is that the problem is indeed in convert_pubtator_2_tsv.py.
What happens is: We need to convert the pubtator file into tsv file before use any pretrained models (bioRed models or bioRex models). But, the tsv file generated is easily empty .... which results a failure in generating 'biorex_model/test_results.tsv', and of course if you looked at the last line of error message, you find "cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory".
But the real source of problem is the "out_processed.tsv", you can check it out. In my case, it is empty.
I cannot further track the problem, as convert_pubtator_2_tsv.py is very complicated script and poorly commented. So I do not know the purpose of each step. But there is one indication: "number_unique_YES_instances 0" It seems that the program found 0 "Yes" instances in pubtator file.
Fortunately, the convert_pubtator_2_tsv.py did work with "Test.pubtator" file in https://ftp.ncbi.nlm.nih.gov/pub/lu/BioRED/BioRED.zip So I was able to compare the pubtator file that can produce a correct out_processed.tsv with the one that produce empty out_processed.tsv, only to find that if pubtator file contained "relations", the prediction worked properly........
And you know how wired it is ...... why should I extract relations if I already have them in file ........
I also reported this problem in BioRED project : https://github.com/ncbi/BioRED/issues/5 As the behavior is quite the same.......
@ptlai
We have resolved the problem through email communication. @pyramid20002000
Yes, my problem is resolved and solution is updated in link below : https://github.com/ncbi/BioRED/issues/5
cp: cannot stat 'biorex_model/test_results.tsv': No such file or directory
how to solve this problem? i meet this error when i run the model.I need help!thanks!