ncbi / BioREx

25 stars 9 forks source link

Request support on data input sample and output sample just for prediction #9

Open Darrshan-Sankar opened 1 month ago

Darrshan-Sankar commented 1 month ago

I used AIONER output to extract relations, but it didn't work. Went through the issues and found the example to be in BioRED repo. Want to know how to create such data and a sample output, about how the predict.pubtator will look like

ptlai commented 1 month ago

Hi @Darrshan-Sankar,

The results of AIONER cannot be fed directly to BioREx. BioREx requires that the entities' IDs be normalized. You have to use our normalization components, such as GNORM2. If you just want to process the PubMed abstracts, you can find the https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3/, where we provide the PubMed precessed relation results. Please let me know if you need further help.

Darrshan-Sankar commented 1 month ago

Hi @Darrshan-Sankar,

The results of AIONER cannot be fed directly to BioREx. BioREx requires that the entities' IDs be normalized. You have to use our normalization components, such as GNORM2. If you just want to process the PubMed abstracts, you can find the https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3/, where we provide the PubMed precessed relation results. Please let me know if you need further help.

@ptlai Thanks for your support. I actually have to process full texts. So could you please guide how to normalise AIONER results to input for BioREx. Possibly a script would help better

ptlai commented 1 month ago

Hi @Darrshan-Sankar,

The simplest way is to use the NE/ID annotations in https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3/ as well (BioCXML files). We processed the NEs/IDs for full-text already, but relations for abstracts only. You can treat each paragraph as an abstract and then feed it to BioREx. If you still need help using normalization components, you may contact Dr. Wei (chih-hsuan.wei@nih.gov), who deals with the entire backend process of our PubTator.

Darrshan-Sankar commented 1 month ago

Hi @Darrshan-Sankar,

The simplest way is to use the NE/ID annotations in https://ftp.ncbi.nlm.nih.gov/pub/lu/PubTator3/ as well (BioCXML files). We processed the NEs/IDs for full-text already, but relations for abstracts only. You can treat each paragraph as an abstract and then feed it to BioREx. If you still need help using normalization components, you may contact Dr. Wei (chih-hsuan.wei@nih.gov), who deals with the entire backend process of our PubTator.

@ptlai Yeah went through the FTP. As you said, only got relations for abstract. Thank you for providing contact of Dr.Wei to contact him