ncbi / BioRED

18 stars 4 forks source link

how to convert text to Input.pubtator (NER) required by BIORED #8

Open Khyati-Microcrispr opened 1 month ago

Khyati-Microcrispr commented 1 month ago

Hi,

Biored ran efficiently, thank you for your help. I have one more favor to ask. How can I perform Named Entity Recognition (NER) and linking in the format required by BioRED for relation prediction? I have input data containing text, titles, and PubMed IDs. I tried using AIONER, but it's not working. I also tried raising an issue on AIONER's GitHub, but no one is replying. Could you please provide me with the correct AIONER code and environment setup, along with the CUDA and cuDNN versions? I am using Ubuntu 22.04, GPU: RTX 4090. Alternatively, if there is any other way to accomplish this task, please let me know.

ptlai commented 1 month ago

Hi @Khyati-Microcrispr,

AIONER does not link entities to their corresponding concept identifiers (e.g., NCBI gene IDs). However, BioREx relies on these concept identifiers. Within PubTator3, we have integrated several normalization tools, including GNorm2, TaggerOne, the NLM-Chem model, and tmVar3, to support the normalization process (https://www.ncbi.nlm.nih.gov/research/pubtator3/api). If you just want to process PubMed abstracts, we have processed them, and the results can be accessed at https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3. For questions regarding the AIONER tool, you may contact Dr. Luo (lingluo@dlut.edu.cn).

Khyati-Microcrispr commented 2 weeks ago

Hi, can I know how many papers have you processed? Using FTP I was only able to get relations for 9 million papers.

On Wed, 5 Jun 2024 at 22:50, Po-Ting Lai @.***> wrote:

Hi @Khyati-Microcrispr https://github.com/Khyati-Microcrispr,

AIONER does not link entities to their corresponding concept identifiers (e.g., NCBI gene IDs). However, BioREx relies on these concept identifiers. Within PubTator3, we have integrated several normalization tools, including GNorm2, TaggerOne, the NLM-Chem model, and tmVar3, to support the normalization process (https://www.ncbi.nlm.nih.gov/research/pubtator3/api). If you just want to process PubMed abstracts, we have processed them, and the results can be accessed at https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3. For questions regarding the AIONER tool, you may contact Dr. Luo @.***).

— Reply to this email directly, view it on GitHub https://github.com/ncbi/BioRED/issues/8#issuecomment-2150576360, or unsubscribe https://github.com/notifications/unsubscribe-auth/BG5NJYQVA6I2VIHUUL3B373ZF5CHXAVCNFSM6AAAAABIYCGZM6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJQGU3TMMZWGA . You are receiving this because you were mentioned.Message ID: @.***>

ptlai commented 1 week ago

Hi @Khyati-Microcrispr ,

We processed all PubMed abstracts, totaling around 37 million, but only a quarter of the abstracts contained relations.