lubianat / ann

A repository for brainstorming and prototyping ideas related to the eLifeSprint project Annotate them all (https://sprint.elifesciences.org/annotate-them-all/)
Apache License 2.0
14 stars 0 forks source link

Is UMLS is matched to Wikidata? #21

Closed webersab closed 4 years ago

webersab commented 4 years ago

What is your idea? Sci Spacy offers a good tool for detecting biomedical entities and linking them the Unified Medical Language System (UMLS). We don't know whether these IDs can help us to link the entities to Wikidata.

What can we do at the Sprint? Sci Spacy can be tried out here: https://scispacy.apps.allenai.org/. You can post a part of a scientifc abstract in an input box and receive a list of annotations that contain UMLS IDs. You can try to find out whether those IDs are present in the Wikidata representation of the entity.

What skills does it require? Maybe some biomedical domain knowledge to judge wether an annotation is correct, but that is not a prerequisite!

Extra info Any extra info you find interesting.

jvfe commented 4 years ago

Huh, that's so cool, I'm not even in the sprint, but I was thinking the same thing earlier today, was talking to Tiago about it and he showed me your issue.

But, anyway, there are about 26K items in wikidata with a UMLS ID associated (https://w.wiki/b6d), so I was playing around while I was bored and made a quick prototype that just associates the UMLS entities detected by scispacy with wikidata items https://github.com/jvfe/wdt_linking - click the badge to run it in google colab.

It's very rough stuff but I'm sure you all can adapt it into something better, feel free to use my existing code.

lubianat commented 4 years ago

So @webersab, if you feel it is resolved, feel free to close the issue, then! Maybe write this down somewhere with a link to @jvfe's repo?

lubianat commented 4 years ago

@jvfe you can also do a PR with your code, if you want