isegura / DDICorpus

34 stars 9 forks source link

something wrong about xml files and the number of labels #1

Open hialoha opened 2 years ago

hialoha commented 2 years ago

I find something wrong about Dextroamphetamine_ddi.xml, line 447, the ddi type is "true", but the following "type" is lost, but "INT" has been labeled in Dextroamphetamine_ddi.ann.

hialoha commented 2 years ago

Also, I find the number of the data been labeled dose not match what the article mentioned.

In particularly, in paper SemEval-2013 Task 9 : Extraction of Drug-Drug Interactions from Biomedical Texts (DDIExtraction 2013), the number of the labels is as follow: data

but, after my counting, I found the number of the labels is as follow:


Train DrugBank Counter({'DRUG': 8197, 'GROUP': 3206, 'BRAND': 1423, 'DRUG_N': 103}) Counter({'EFFECT': 1535, 'MECHANISM': 1257, 'ADVISE': 818, 'INT': 179}) MedLine Counter({'DRUG': 1228, 'DRUG_N': 401, 'GROUP': 193, 'BRAND': 14}) Counter({'EFFECT': 152, 'MECHANISM': 62, 'INT': 10, 'ADVISE': 8})

Test DrugBank Counter({'DRUG': 1698, 'GROUP': 691, 'BRAND': 400, 'DRUG_N': 27}) Counter({'EFFECT': 298, 'MECHANISM': 278, 'ADVISE': 214, 'INT': 94}) MedLine Counter({'DRUG': 517, 'DRUG_N': 234, 'GROUP': 131, 'BRAND': 28}) Counter({'EFFECT': 62, 'MECHANISM': 24, 'ADVISE': 7, 'INT': 2})