openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

Missing properties and Differences in Chemical Properties for Compound Info #331

Closed nicklynch closed 8 years ago

nicklynch commented 8 years ago

There are differences for the calculated properties available for compounds only now in ChEMBL 20.

Two examples: OPS2824291 only has MW OPS2778764 has SMILES, inchi as well

Why are these differences in the amount of data and also no other properties are available?

Both have ChemSpider records with properties

http://ops.rsc.org/OPS2824291

https://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2824291&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1

http://ops.rsc.org/OPS2778764

https://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2778764&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1

{ format: "linked-data-api", version: "2.0", result: { _about: "http://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2824291&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1", definition: "http://ops2.few.vu.nl/api-config", extendedMetadataVersion: "http://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2824291&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1&_metadata=all%2Cviews%2Cformats%2Cexecution%2Cbindings%2Csite", linkPredicate: "http://www.w3.org/2004/02/skos/core#exactMatch", activeLens: "Default", primaryTopic: { _about: "http://ops.rsc.org/OPS2824291", exactMatch: { _about: "http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL2403108", mw_freebase: 558.14, inDataset: "http://www.ebi.ac.uk/chembl", type: "http://rdf.ebi.ac.uk/terms/chembl#SmallMolecule" }, isPrimaryTopicOf: "http://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2824291&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1" } } }

{ format: "linked-data-api", version: "2.0", result: { _about: "http://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2778764&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1", definition: "http://ops2.few.vu.nl/api-config", extendedMetadataVersion: "http://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2778764&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1&_metadata=all%2Cviews%2Cformats%2Cexecution%2Cbindings%2Csite", linkPredicate: "http://www.w3.org/2004/02/skos/core#exactMatch", activeLens: "Default", primaryTopic: { _about: "http://ops.rsc.org/OPS2778764", inDataset: "http://ops.rsc.org", inchi: "InChI=1S/C49H54F2N8O6/c1-24(2)39(56-46(62)64-5)44(60)58-23-48(15-16-48)21-38(58)42-52-22-37(55-42)28-9-13-32-31-12-8-26(18-33(31)49(50,51)34(32)19-28)27-10-14-35-36(20-27)54-43(53-35)41-29-7-11-30(17-29)59(41)45(61)40(25(3)4)57-47(63)65-6/h8-10,12-14,18-20,22,24-25,29-30,38-41H,7,11,15-17,21,23H2,1-6H3,(H,52,55)(H,53,54)(H,56,62)(H,57,63)/t29-,30+,38-,39-,40-,41-/m0/s1", inchikey: "VRTWBAAJJOHBQU-KMWAZVGDSA-N", molformula: "C49 H54 F2 N8 O6", smiles: "CC(C)C@HC(=O)N1CC2(C[C@H]1C1NC=C(N=1)C1=CC3=C(C=C1)C1C=CC(=CC=1C3(F)F)C1=CC3N=C(NC=3C=C1)[C@@H]1[C@@H]3CC@@HN1C(=O)C@@HC(C)C)CC2", exactMatch: [ "http://ops.rsc.org/OPS2778764", { _about: "http://rdf.ebi.ac.uk/resource/chembl/molecule/CHEMBL2374220", mw_freebase: 889, inDataset: "http://www.ebi.ac.uk/chembl", type: "http://rdf.ebi.ac.uk/terms/chembl#SmallMolecule" } ], isPrimaryTopicOf: "http://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS2778764&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1" } } }

nicklynch commented 8 years ago

via @stain PROPERTIES_CHEMBL20151104.ttl has lots of statements about ops:OPSOPS2984398 kind of identifiers :OPS519347ct obo:IAO_0000136 ops:OPSOPS519347 .

batchelorc commented 8 years ago

Update: I can confirm that the latest version of the code at our end generates consistent identifiers (no "OPSOPS"), so do we assign this task to someone to s/ops:OPSOPS/ops:OPS/g at the other end?

nicklynch commented 8 years ago

Cheers, we are making the change in the current data file and reloading tonight.

nicklynch commented 8 years ago

@antonisloizou Thanks for reloading the properties file.

After the reload, Chemical properties are still missing for some compounds Another example is this one http://ops.rsc.org/OPS1036706 - queries below for this in 1.5 and 2.0

In the properties file, there are properties for this compound in the file that I can see

:OPS1036706ct rdf:type cheminf:CHEMINF_000055 . :OPS1036706ct obo:IAO_0000136 ops:OPSOPS1036706 . :OPS1036706execution rdf:type cheminf:CHEMINF_000354 . :OPS1036706execution obo:OBI_0000293 :OPS1036706ct . :OPS1036706execution obo:OBI_0000299 :OPS1036706prop0 . :OPS1036706prop0 rdfs:label "Compound OPS1036706 property Density in qudt:KilogramPerCubicMeter"@en . :OPS1036706prop0 obo:IAO_0000136 ops:OPS1036706 . :OPS1036706prop0 rdf:type cheminf:CHEMINF_000359 . :OPS1036706prop0 qudt:numericValue "0.001323"^^xsd:double . :OPS1036706prop0 qudt:unit qudt:KilogramPerCubicMeter . :OPS1036706prop0 qudt:standardUncertainty "0.0001"^^xsd:double . :OPS1036706execution obo:OBI_0000299 :OPS1036706prop1 . :OPS1036706prop1 rdfs:label "Compound OPS1036706 property Ref

https://ops2.few.vu.nl/2.0/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS1036706&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1

https://beta.openphacts.org/1.5/compound?uri=http%3A%2F%2Fops.rsc.org%2FOPS1036706&app_id=08bbd062&app_key=a70ea371bda724999d20109d11752eb1&_format=json

antonisloizou commented 8 years ago

Arg. Now the problem is that the ops prefix is declared with an additional '/' character. @prefix ops: <http://ops.rsc.org//> .

I'll replace these too and reload...

stain commented 8 years ago

this seems to be the case in all of the files.

On 30 November 2015 at 12:59, Antonis Loizou notifications@github.com wrote:

Arg. Now the problem is that the ops prefix is declared with an additional '/' character. @prefix ops: http://ops.rsc.org// .

I'll replace these too and reload...

— Reply to this email directly or view it on GitHub https://github.com/openphacts/GLOBAL/issues/331#issuecomment-160621391.

Stian Soiland-Reyes, eScience Lab School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718

stain commented 8 years ago

For reference, the archive ops-rsc-dataset-20151104-20151130.142922-3.data.zip from http://repository.mygrid.org.uk/artifactory/ops/org/openphacts/data/ops-rsc-dataset/20151104-SNAPSHOT/ also includes both the ops:OPSOPS and // patches.

antonisloizou commented 8 years ago

after this latest reload missing properties seem to be there now

nicklynch commented 8 years ago

Yes, properties are there now

E.g.

CC1=CC(NC2N=C(NC3C=CC=CC=3S(=O)(=O)C(C)C)C(Cl)=CN=2)=C(C=C1C1CCNCC1)OC(C)C 9.0 2.0 113.62 558.135 C28 H36 Cl N5 O3 S 5.033 VERWOWGGCGHDQE-UHFFFAOYSA-N InChI=1S/C28H36ClN5O3S/c1-17(2)37-25-15-21(20-10-12-30-13-11-20)19(5)14-24(25)33-28-31-16-22(29)27(34-28)32-23-8-6-7-9-26(23)38(35,36)18(3)4/h6-9,14-18,20,30H,10-13H2,1-5H3,(H2,31,32,33,34) 3.0 8.0 Default