ncbi / BioREx

25 stars 9 forks source link

Inconsistent Relationship Extraction Results Compared to PubTator3 Website #11

Open dillonl opened 1 week ago

dillonl commented 1 week ago

I've encountered an issue where I'm unable to reproduce the relationship extraction results from the PubTator3 website (example: 19394258) using BioREx. When I run the tool using the suggested model, it only outputs one relationship, whereas the PubTator3 site identifies twelve.

Additionally, when I run the code as is (run_test_pred.sh), it crashes due to an empty intermediate file (out_processed.tsv). I've noticed that if I hardcode the relationships in src_tgt_pairs variable in src/convert_pubtator_2_tsv.py, the process continues past this issue, but it still doesn't match the expected output.

I suspect this might be due to differences in the models used. The README lists several models, but none seem to produce output that matches what the website provides.

Could you clarify whether the model used on the website is available in the repository? Also, any guidance on how you run this tool on the PubTator3 website would be appreciated.

Thanks you.

ptlai commented 5 days ago

Hi @dillonl ,

Sorry for the late reply.

BioREx can only predict the relation types of BioRED. Please let me know if you need any help while reproducing.

For PubTator3, we use a mapping table to map BioRED relation types to PubTator3 relation types as below:

chemical-chemical: positive_correlation => positive_correlate
chemical-chemical: negative_correlation => negative_correlate
chemical-chemical: association => associate
chemical-chemical: bind => interact
chemical-chemical: comparison => compare
chemical-chemical: conversion => convert
chemical-chemical: cotreatment => cotreat
chemical-chemical: drug_interaction => drug_interact
chemical-disease: positive_correlation => cause
chemical-disease: negative_correlation => treat
chemical-disease: association => associate
chemical-gene: positive_correlation => positive_correlate
chemical-gene: negative_correlation => negative_correlate
chemical-gene: association => associate
chemical-gene: bind => interact
chemical-variant: positive_correlation => stimulate
chemical-variant: negative_correlation => inhibit
chemical-variant: association => associate
chemical-variant: bind => interact
disease-gene: positive_correlation => stimulate
disease-gene: negative_correlation => inhibit
disease-gene: association => associate
disease-variant: positive_correlation => cause
disease-variant: negative_correlation => prevent
disease-variant: association => associate
gene-gene: positive_correlation => positive_correlate
gene-gene: negative_correlation => negative_correlate
gene-gene: association => associate
gene-gene: bind => interact
variant-variant: association => associate

Each NE pair and BioRED relation type will be assigned the corresponding PubTator3 relation type.