Linking the Open literature to Wikidata

petermr commented 7 years ago

Wikidata (https://www.wikidata.org/wiki/Wikidata:Main_Page ) is the largest collection of open public metadata in the world. It has a good biomedical emphasis with entries for genes, diseases, drugs, species, organizations, software etc. It is becoming the primary resource for indexing and describing scientific data, especially where it is standardized under authorities (e.g. Genbank, ICD-10).

This hack will explore what is in Wikidata and how it can help branches of science. One immediate use is to find and index terms in biomedical papers (e.g. in EuropePMC). This enhances the reading experience and also informs Wikidata editors of possible entries that aren't already included.

Our non-profit company contentmine.org is supported by the Wikimedia Foundation to extract facts and make them open. An example is https://tarrow.github.io/factvis/ where we have extracted a days literature and indexed it against Wikidata. In the reverse direction we hope to sho how Apache Spark (word2vec) can be used to find "similar terms" which are in the literature but not in Wikidata.

Would be happy to combine this with other Wikidata-oriented hacks.

petermr commented 7 years ago

How do we add labels? Happy to give a lightning talk if people want.

Daniel-Mietchen commented 7 years ago

I've added the labels.

ekoner commented 7 years ago

+1

sparcopen / open-research-doathon

Linking the Open literature to Wikidata #31