jacobwindsor / dar-tool

Placeholder repo for the DAR-tool
0 stars 0 forks source link

Use WikiData for ID gathering #3

Open jacobwindsor opened 7 years ago

jacobwindsor commented 7 years ago

Use the WikiData SPARQL endpoint for initial ID gathering rather than relying on PubChem:

To search by CAS:

SELECT ?compound ?compoundLabel ?pcid ?chebi ?kegg ?cas WHERE {  ?compound wdt:P235 "ILRYLPWNYFXEMH-WHFBIAKZSA-N" .  OPTIONAL { ?compound wdt:P662 ?pcid . }  OPTIONAL { ?compound wdt:P683 ?chebi . }  OPTIONAL { ?compound wdt:P665 ?kegg . }  OPTIONAL { ?compound wdt:P231 ?cas . }  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}

to search by InChi:

SELECT ?compound ?compoundLabel ?pcid ?chebi ?kegg WHERE {  ?compound wdt:P231 "50-00-0" .  OPTIONAL { ?compound wdt:P662 ?pcid . }  OPTIONAL { ?compound wdt:P683 ?chebi . }  OPTIONAL { ?compound wdt:P665 ?kegg . }  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }}
jacobwindsor commented 7 years ago

Oh and stop using the IUPAC name field (also applies to the API too). Just have a CSV file of CAS/InChi

egonw commented 7 years ago

@jacobwindsor, what do you need me to do for this issue?

jacobwindsor commented 7 years ago

@egonw, a while back you said that you wanted to help with this one. I'm not sure what you had in mind or how much time you have.

Basically, all that needs doing is the creation of a service for Wikidata in much the same way as the PubChem service, and then changing the logic of the Compounds service to fit the new procedure.

Are you still interested in doing this? Otherwise, I can do it when I have time (a couple of weeks).