Closed egonw closed 4 years ago
@petermr, where can I find the list of 200 CIDs?
on https://github.com/gilienv/EssOilDB/tree/master/tables/chemistry/
It's a bit messy as we have split / forked the chemistry and the disambiguation and cleaning is going on there. Ambarish Kumar is doing a good job, but he's not working on CEV. See https://github.com/gilienv/EssOilDB/issues/76 which has 100 comments and look at the latest. It may be easiest just to download the tables. from /tables/chemistry/ I have posted an html table today.
Do you need access? I'll post something.
On Wed, Aug 28, 2019 at 11:27 AM Egon Willighagen notifications@github.com wrote:
@petermr https://github.com/petermr, where can I find the list of 200 CIDs?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/5?email_source=notifications&email_token=AAFTCS7XWSVT3XIPHLEOZF3QGZHJ5A5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5KUVEQ#issuecomment-525683346, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS4F4VZAOEYDMGS2HALQGZHJ5ANCNFSM4IRB6WNA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
I need to part this until at least next week, I'm afraid. I got some urgent stuff to solve first :(
There's no rush
On Wed, Aug 28, 2019 at 2:57 PM Egon Willighagen notifications@github.com wrote:
I need to part this until at least next week, I'm afraid. I got some urgent stuff to solve first :(
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/5?email_source=notifications&email_token=AAFTCSYS5FU2K7RVTTX3PS3QGZ76HA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LGZOQ#issuecomment-525757626, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS5AV663QZXN3XGGI4LQGZ76HANCNFSM4IRB6WNA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Sorry, there are too many files in that folder... I have no idea at this moment how to see which compounds have not been found in Wikidata yet (and that I should add).
Help/suggestions welcome.
We have copied the 2112 EssoilDB compounds to CEVOpen. @Ambarish Kumar ambari73_sit@jnu.ac.in is working on them. We found far too many synonyms in PubChem and ChEBI so we've dropped those to about 300 which were found in EssoilDB 1.0
Ambarish , do we have a simple list of compounds in CEVOpen that do not have Wikidata entries?
On Tue, Sep 10, 2019 at 8:50 AM Egon Willighagen notifications@github.com wrote:
Sorry, there are too many files in that folder... I have no idea at this moment how to see which compounds have not been found in Wikidata yet (and that I should add).
Help/suggestions welcome.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/5?email_source=notifications&email_token=AAFTCS6OH3AOGXBURWACJM3QI5GSVA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KF3FY#issuecomment-529816983, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6XPN6MJBIIXSTFHRDQI5GSVANCNFSM4IRB6WNA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Yes sir.
Please check the list of compounds which does not has Wikidata Id.
notFoundWikidata.csv - https://github.com/petermr/CEVOpen/blob/master/notFoundWikidata.csv
Total number of records 583.
On Tue, Sep 10, 2019 at 4:11 PM Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:
We have copied the 2112 EssoilDB compounds to CEVOpen. @Ambarish Kumar ambari73_sit@jnu.ac.in is working on them. We found far too many synonyms in PubChem and ChEBI so we've dropped those to about 300 which were found in EssoilDB 1.0
Ambarish , do we have a simple list of compounds in CEVOpen that do not have Wikidata entries?
On Tue, Sep 10, 2019 at 8:50 AM Egon Willighagen notifications@github.com wrote:
Sorry, there are too many files in that folder... I have no idea at this moment how to see which compounds have not been found in Wikidata yet (and that I should add).
Help/suggestions welcome.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/5?email_source=notifications&email_token=AAFTCS6OH3AOGXBURWACJM3QI5GSVA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KF3FY#issuecomment-529816983, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6XPN6MJBIIXSTFHRDQI5GSVANCNFSM4IRB6WNA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
-- AMBARISH KUMAR
M.Tech - 2014-2016 SC&IS ; Jawaharlal Nehru University, New Delhi, INDIA
+91 - 8377964303 ambari73_sit@jnu.ac.in http://goog_1594941533 er.ambarish@gmail.com er.ambarish@gmail.com
Thanks Ambarish, Egon does this look tractable?
(I suspect that some of these - especially the esters which may have missing spaces - are not what the authors intended, but that doesn't alter the validity of linking Wikidata to Pubchem - it just means they may not get used frequently.
On Tue, Sep 10, 2019 at 12:29 PM Ambarish Kumar ambari73_sit@jnu.ac.in wrote:
Yes sir.
Please check the list of compounds which does not has Wikidata Id.
notFoundWikidata.csv - https://github.com/petermr/CEVOpen/blob/master/notFoundWikidata.csv
Total number of records 583.
On Tue, Sep 10, 2019 at 4:11 PM Peter Murray-Rust < peter.murray.rust@googlemail.com> wrote:
We have copied the 2112 EssoilDB compounds to CEVOpen. @Ambarish Kumar ambari73_sit@jnu.ac.in is working on them. We found far too many synonyms in PubChem and ChEBI so we've dropped those to about 300 which were found in EssoilDB 1.0
Ambarish , do we have a simple list of compounds in CEVOpen that do not have Wikidata entries?
On Tue, Sep 10, 2019 at 8:50 AM Egon Willighagen < notifications@github.com> wrote:
Sorry, there are too many files in that folder... I have no idea at this moment how to see which compounds have not been found in Wikidata yet (and that I should add).
Help/suggestions welcome.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/5?email_source=notifications&email_token=AAFTCS6OH3AOGXBURWACJM3QI5GSVA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KF3FY#issuecomment-529816983, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS6XPN6MJBIIXSTFHRDQI5GSVANCNFSM4IRB6WNA .
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
-- AMBARISH KUMAR
M.Tech - 2014-2016 SC&IS ; Jawaharlal Nehru University, New Delhi, INDIA
+91 - 8377964303 ambari73_sit@jnu.ac.in http://goog_1594941533 er.ambarish@gmail.com er.ambarish@gmail.com
-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK
Yes, thanks!
Processing the file...
This looks promising :)
====================
C₁₇H₂₈O₂ is not yet in Wikidata
====================
====================
C₁₅H₂₀O is not yet in Wikidata
====================
====================
C₁₇H₂₄O₂ is not yet in Wikidata
====================
====================
C₁₆H₂₂O₂ is not yet in Wikidata
====================
====================
C₁₀H₁₆O is not yet in Wikidata
====================
====================
C₉H₁₆ is not yet in Wikidata
====================
====================
C₁₅H₂₆O is not yet in Wikidata
====================
====================
C₁₅H₂₂O is not yet in Wikidata
====================
====================
C₁₅H₂₄O is not yet in Wikidata
====================
====================
C₁₅H₂₆O is not yet in Wikidata
====================
And so does the next step! :) The first 10 missing entries are in (573 to go ;)
For the next batch, I do find hits in Wikidata, tho. But that's not a problem.
Okay, this is the workflow. On the above linked CSV file, I run this script: https://github.com/egonw/ons-wikidata/blob/master/EssOil/prepareInput.groovy This prepares the content for https://github.com/egonw/ons-wikidata/blob/master/Wikidata/createWDitemsFromSMILES.groovy which I run after that. The first (new) script fetches the SMILES for the compounds from PubChem.
I'm now doing the remaining batch: https://tools.wmflabs.org/quickstatements/#/batch/18772
Egon this is great We are writing a paper for mat todd and would be great to put all this in
On Wed, 18 Sep 2019, 15:59 Egon Willighagen, notifications@github.com wrote:
I'm now doing the remaining batch: https://tools.wmflabs.org/quickstatements/#/batch/18772
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/5?email_source=notifications&email_token=AAFTCS4IEXJXKENAXQDFU4LQKI64NA5CNFSM4IRB6WNKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7ALXUY#issuecomment-532724691, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFTCS2NPURYQCK2RM2MNJ3QKI64NANCNFSM4IRB6WNA .
Hi all, so what is next for this issue?
By creating a Bacting script that takes PubChem CIDs and adds the corresponding compounds to Wikidata.