Closed ap1438 closed 2 years ago
Hi!
Thank you for your issue.
Actually, we support a lot of custom searches (see https://lotus.naturalproducts.net/documentation) but not the specific one you requested.
We might provide a SPARQL endpoint in the future to handle such requests but in the meantime, querying Wikidata directly seems a good option.
I prepared a query you can easily adapt for you: https://w.wiki/5GSw. You can directly download the results as a tabular file there.
Another option could be to use https://pubchem.ncbi.nlm.nih.gov/classification/#hid=115 and search there directly, they offer CSV download also.
More generally, the compounds' names are automatically generated so we would advise being very cautious with them.
Best
Thank you for your quick response and valuable suggestion. As i see the code and downloaded the data the fields molecular formulae was missing. So, i tried to modify the code and download the molecular formulae also. But i don't know why it shows query time limit reached. So, I tried this code
Can you check and guide me where did i go wrong.
You were almost there!
I think the query you want is: https://w.wiki/5Ggd
Your was querying again against whole Wikidata for molecules
Thanks for the correction and insights.
Search for "Gentiana" returned 483 natural products in LOTUS Database search in LOTUS webpage. BUT wiki data query returns 768 . Why is this much difference.
Can you please let me know the reason behind the difference?
Hi,
Not exactly, the query I wrote you gives structure-organism pairs. So the same structure can appear multiple times. If you want to reduce it to distinct structures, here: https://w.wiki/5J73.
Hope this clarifies
Thank you
I'm trying to do something similar and following your examples, when I run:
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structureCAS ?structureINCHIKEY ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q21754 # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
}
?organism (wdt:P171*) ?taxon; # Include children taxa
wdt:P225 ?organism_name. # Get organism name
?structure wdt:P233 ?structure_smiles; # Get the SMILES
(p:P703/ps:P703) ?organism. # Found in given taxon/taxa
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000
I get 20968 results, however when I try to include CASID and INCHIKEY information with the following:
SELECT DISTINCT ?structure ?structureLabel ?structure_smiles ?structureCAS ?structureINCHIKEY ?organism ?organism_name WHERE {
VALUES ?taxon {
wd:Q21754 # You can remove the Qxxxxxx and hit Ctrl+space, type the first letters and it should autocomplete
}
?organism (wdt:P171*) ?taxon; # Include children taxa
wdt:P225 ?organism_name. # Get organism name
?structure wdt:P233 ?structure_smiles; # Get the SMILES
(p:P703/ps:P703) ?organism; # Found in given taxon/taxa
wdt:P231 ?structureCAS; # Get the CAS
wdt:P235 ?structureINCHIKEY. # Get the INCHIKEY
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 100000
I only get 7967 results. I imagine this might be because the latter query doesn't return instances without a CAS ID or INCHIKEY. Is it possible to return all metabolites found in taxa and leave missing values for the properties as NaN?
So, i have an organism and i want to download all the chemical compounds related to that organism with their smile ID and the species that produce those chemical compounds.
So what i did was just search in the web page and found all the entries of chemical compounds related to that organism. And downloaded the SDF file which was the only downloading option available. And later converted it to excel format.
But what i realized was that file was missing compound names.
So what i wanted was Compound name, Smile ID, Species it is present.
Is is possible to get it as such from the LOTUS database by any means ?