ncbi / BioREx

25 stars 9 forks source link

Can you proivde a script or an explanation to reproduce scores in your paper? #5

Open dongheechoi opened 5 months ago

dongheechoi commented 5 months ago

In the paper (https://arxiv.org/abs/2306.11189), you wrote the scores below. Can you kindly provide a way to reproduce this?

image image

For example, with the model you provided in the repo BioREx PubMedBERT model (Original) and BioREx BioLinkBERT model (Preferred), what score can I get? And how can I get the score?

When I run with BioREx PubMedBERT model (Original) using the code you suggest bash scripts/run_test_pred.sh, I got

Overall 966 652 263 314 0.7125683060109289 0.6749482401656315 0.6932482721956407 in the file locaed in "out_result_file" parameter. I think it would be precision, recall, f1 score, but then I am not sure I can get 79.6 in this case(BioRED+8 datasets in your paper).

If I misunderstood something, please let me know. And again, if you provide a specific parameters to reproduce the scores in the paper (including the baseline approaches like TL(Transfer learning) or MTL(Multi-Task Learning), it would be great help for me as well.

dongheechoi commented 5 months ago

Also, if you can specify parameters for the other dataset score, it would be very helpful. For example, BC5CDR, DDI, DrugProt, and others.

ptlai commented 5 months ago

Hi @dongheechoi ,

The score that you display is a typed score, and is located in 'out_biorex_results.txt' The score that is in our paper is the binary score, and located in 'out_biorex_bin_results.txt'

To evaluate the prediction of our models, you can consider our latest leaderboard (https://codalab.lisn.upsaclay.fr/competitions/16381).

For different set + BioRED exps, I used the same parameters.

dongheechoi commented 3 months ago

I am sorry to ask this later, but then, do you mean this file? image

And I cannot find 79.6 here, so you didn't put the result on the leaderboard, right? image

Also, I am wondering that some of the data in the BioREx test set is in the BioRED validation set, so I am not sure which dataset I can use.

For example, I can find this sentence in BioRED validation file image

but I can find the same sentence in test file in ncbi_relation and biorex folders.

image image

dongheechoi commented 3 months ago

I have used https://ftp.ncbi.nlm.nih.gov/pub/lu/BioREx/datasets.zip for the BioREx dataset.

ptlai commented 3 months ago

Hi @dongheechoi ,

I wanted to clarify a few points regarding BioREx.