lasigeBioTM / MER

Minimal Named-Entity Recognizer (MER)
http://labs.fc.ul.pt/mer/
56 stars 8 forks source link

Terms with punctuation #38

Closed AndreLamurias closed 5 years ago

AndreLamurias commented 7 years ago

Some HPO terms with punctuation are not recognized:

grep  "3-4 toe syndactyly" data/hpo.txt
3-4 toe syndactyly
./get_entities.sh "3-4 toe syndactyly" hpo
8       18      syndactyly
4       18      toe syndactyly

grep "short stature, severe disproportionate" data/hpo.txt
short stature, severe disproportionate
./get_entities.sh "short stature, severe disproportionate" hpo
15      21      severe
0       13      short stature

grep "cataract, congenital" data/hpo.txt
cataract, congenital
./get_entities.sh "cataract, congenital" hpo
0       8       cataract
AndreLamurias commented 5 years ago

get_entities line 47 removes full stops, commenting it fixes the last 2 cases, should be optional

AndreLamurias commented 5 years ago

get_entities line 144 causes the first error, by matching with only at least 5 alpha chars

AndreLamurias commented 5 years ago

get_entities line 144 causes the first error, by matching with only at least 5 alpha chars

even after changing this, it is not linked to the URI in the links file, because of line 181, which removes the first digit