Closed alrichardbollans closed 1 year ago
Hi @alrichardbollans,
You are perfectly right, what you were missing is the OPTIONAL
, allowing for a property also not to be present.
Here is probably what you were looking for:
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structure_cas ?structure_inchikey ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q21754
}
?organism (wdt:P171*) ?taxon;
wdt:P225 ?organism_name.
?structure wdt:P233 ?structure_smiles;
(p:P703/ps:P703) ?organism.
OPTIONAL {
?structure wdt:P231 ?structure_cas;
wdt:P235 ?structure_inchikey.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000
Hope this answers your question, happy to elaborate if not. 👍🏼
Aha this is great, thanks! Still getting my head around SPARQL so this is really handy. How would I also make the SMILES key optional?
The issue you might face by putting it as optional is that you would end up having things that are not necessarily small molecules. You should then force given instances at the beginning (like in https://www.wikidata.org/wiki/Wikidata:WikiProject_Chemistry_Natural_products#What_was_already_there?) and I am not sure you would have much more results. You could eventually switch the current InChIKey/SMILES if you want to try.
OK, thanks for this!
I've just noticed that the INCHI key isn't being returned for metabolites in some taxa, even though the InChi key is given in lotus/wikidata. For example, with the query:
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structure_cas ?structure_inchikey ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q55925442
}
?organism (wdt:P171*) ?taxon; # Include children taxa
wdt:P225 ?organism_name. # Get organism name
?structure wdt:P233 ?structure_smiles; # Get the SMILES
(p:P703/ps:P703) ?organism. # Found in given taxon/taxa
OPTIONAL {
?structure wdt:P231 ?structure_cas;
wdt:P235 ?structure_inchikey.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000
The structure wd:Q104888293 is returned but no value is provided for its structure_inchikey
. Why is this?
Good catch!
Something like
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structure_cas ?structure_inchikey ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q55925442
}
?organism (wdt:P171*) ?taxon;
wdt:P225 ?organism_name.
?structure wdt:P233 ?structure_smiles;
(p:P703/ps:P703) ?organism.
OPTIONAL { ?structure wdt:P235 ?structure_inchikey. }
OPTIONAL { ?structure wdt:P231 ?structure_cas. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000
Should solve this, do not hesitate to reopen in case
This is great! Is it possible to also make the SMILES also optional, or is this redundant? My attempt is:
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structure_cas ?structure_inchikey ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q55925442
}
?organism (wdt:P171*) ?taxon;
wdt:P225 ?organism_name.
?structure (p:P703/ps:P703) ?organism.
OPTIONAL { ?structure wdt:P235 ?structure_inchikey. }
OPTIONAL { ?structure wdt:P233 ?structure_smiles. }
OPTIONAL { ?structure wdt:P231 ?structure_cas. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100
I would not recommend it, but it is feasible. The problem is not the redundancy but rather having something you can trust. An (almost) empty entry with no SMILES, no CAS, no InChIKey, I would hardly trust.
Ok thanks, this is good to know. My intention is to incorporate this into my data by matching CAS, SMILES or InCHIKeys so effectively those instances with none of these would be ignored. I guess ideally the query would return all those metabolites with at least of one CAS, SMILES or InCHIKeys
Something like
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structure_cas ?structure_inchikey ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q55925442
}
?organism (wdt:P171*) ?taxon;
wdt:P225 ?organism_name.
?structure (p:P703/ps:P703) ?organism.
OPTIONAL { ?structure wdt:P235 ?structure_inchikey. }
OPTIONAL { ?structure wdt:P233 ?structure_smiles. }
OPTIONAL { ?structure wdt:P231 ?structure_cas. }
BIND (CONCAT(COALESCE(?structure_inchikey,""), COALESCE(?structure_smiles,""), COALESCE(?structure_cas,"")) AS ?key)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
FILTER (STRLEN(?key) > 1)
}
LIMIT 100000
should do the trick. I do not think there are any "fully empty" entries to test but anyway...
I'm trying extract all metabolites in a plant order and include given CAS ID, INCHIKey and Smiles information. When I run:
I get 20968 results, however when I try to include CASID and INCHIKEY information with the following:
I only get 7967 results. I imagine this might be because the latter query doesn't return instances without a CAS ID or INCHIKEY. Is it possible to return all metabolites found in taxa and leave missing values for the properties as NaN?
Originally posted by @alrichardbollans in https://github.com/lotusnprod/lotus-web/issues/27#issuecomment-1619999166