petermr / CEVOpen

Contentmining of Open phytochemical literature for medicinal activities
26 stars 19 forks source link

Bundling of the library structures with Chem4Word #61

Closed deadlyvices closed 4 years ago

deadlyvices commented 4 years ago

When we release Chem4Word we bundle a standard library with it, for illustrative purposes. I was wondering if there would be any objections to our using the 2100 structure library we have generated as part of this project. The structures are very nice indeed and show off our custom display component brilliantly. If it isn't OK then please let me know.

petermr commented 4 years ago

have mailed you and Gitanjali

petermr commented 4 years ago

searching compound library

Do you have any structure or substructure or name search. The first could be done by InChI. Since we have a relatively small numner of entries some of this could be brute-forced or use simple data structures (maps, tries, etc.).

deadlyvices commented 4 years ago

Substructure: no; name search, yes. What application exactly do you have in mind?

deadlyvices commented 4 years ago

If it's part of the data crunching then we could probably do either quite easily in KNIME.

petermr commented 4 years ago

Just a casual query. It's a useful facility which could attract people to using it. Are there public bitmaps that could be used?

On Tue, Nov 26, 2019 at 4:41 PM Clyde Davies notifications@github.com wrote:

Substructure: no; name search, yes. What application exactly do you have in mind?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=AAFTCS5FCDMP2DJZWL6V2ITQVVGSDA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFGVCOQ#issuecomment-558715194, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS6SF6C2HO4KMA4DA6TQVVGSDANCNFSM4JRKKBDQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Public bitmaps? By that, you mean images? Not yet.

petermr commented 4 years ago

No I meant chemical fingerprints -

On Tue, Nov 26, 2019 at 6:37 PM Clyde Davies notifications@github.com wrote:

Public bitmaps? By that, you mean images? Not yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=AAFTCS6VR2UPRL75P7TGH33QVVUFHA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFHAYSA#issuecomment-558763080, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS5H56SEOABBEGRFEMLQVVUFHANCNFSM4JRKKBDQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

MikeWilliams-UK commented 4 years ago

Just to make things clear. The library in Chem4Word consists of a CML definition along with textual labels. No bitmap or other image is stored in the library. Any images shown are generated when required. A point in time copy of a structure is stored in the word as a DrawingML (generated from the CML) image inside the document. This is so that anyone can print a Chem4Word document without installing Chem4Word.

MikeWilliams-UK commented 4 years ago

In Chem4Word we also store a copy of the CML inside the document so that the structure is editable and this can easily be mined.

MikeWilliams-UK commented 4 years ago

When a structure is edited by a Chem4Word end user the structure is converted to MDL MOLfile and submitted to a web service I have written (as an Azure function) which calculates the inchi key then attempts to look up extra properties such as iupac name, smiles on chemspider. Any properties found are merged into the CML before being written back to the word document.

petermr commented 4 years ago

"bitmap" used to be used for fingerprint, see: https://www.daylight.com/dayhtml/doc/theory/theory.finger.html

On Tue, Nov 26, 2019 at 9:04 PM Mike Williams notifications@github.com wrote:

When a structure is edited by a Chem4Word end user the structure is converted to MDL MOLfile and submitted to a web service I have written (as an Azure function) which calculates the inchi key then attempts to look up extra properties such as iupac name, smiles on chemspider. Any properties found are merged into the CML before being written back to the word document.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=AAFTCSYIAJ4P3NPTNNJ33QTQVWFMDA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFHNXSY#issuecomment-558816203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCSYG5SJD5NMZUDJGSB3QVWFMDANCNFSM4JRKKBDQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

deadlyvices commented 4 years ago

Yes, I know what you mean now. None as yet. We'ed like to build in substructure searching at some point but as yet the library sizes just haven't merited it.

Mike has been doing some work on adding more tags and names. These should help aid searching

On Tue, Nov 26, 2019 at 7:51 PM petermr notifications@github.com wrote:

No I meant chemical fingerprints -

On Tue, Nov 26, 2019 at 6:37 PM Clyde Davies notifications@github.com wrote:

Public bitmaps? By that, you mean images? Not yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=AAFTCS6VR2UPRL75P7TGH33QVVUFHA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFHAYSA#issuecomment-558763080 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAFTCS5H56SEOABBEGRFEMLQVVUFHANCNFSM4JRKKBDQ

.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=ACM3QMVJU22DXCQST5MKL33QVV45ZA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFHHVKQ#issuecomment-558791338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACM3QMVZRJ5KIKOVAAND6LLQVV45ZANCNFSM4JRKKBDQ .

-- Clyde

petermr commented 4 years ago

I am now at the stage of running the literature names against the current 2100 dictionary. There are a lot of synonyms, e.g. "thymol" is not in our dictionary, but "p-cymen3-ol" is. I don't want to all all the wikidata synonyms as there are many obscure ones, but I want to add common ones.

I am now thinking hard about our poster. We should probably open an issue

On Wed, Nov 27, 2019 at 10:36 AM Clyde Davies notifications@github.com wrote:

Yes, I know what you mean now. None as yet. We'ed like to build in substructure searching at some point but as yet the library sizes just haven't merited it.

Mike has been doing some work on adding more tags and names. These should help aid searching

On Tue, Nov 26, 2019 at 7:51 PM petermr notifications@github.com wrote:

No I meant chemical fingerprints -

On Tue, Nov 26, 2019 at 6:37 PM Clyde Davies notifications@github.com wrote:

Public bitmaps? By that, you mean images? Not yet.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <

https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=AAFTCS6VR2UPRL75P7TGH33QVVUFHA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFHAYSA#issuecomment-558763080

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AAFTCS5H56SEOABBEGRFEMLQVVUFHANCNFSM4JRKKBDQ

.

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=ACM3QMVJU22DXCQST5MKL33QVV45ZA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFHHVKQ#issuecomment-558791338 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ACM3QMVZRJ5KIKOVAAND6LLQVV45ZANCNFSM4JRKKBDQ

.

-- Clyde

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/petermr/CEVOpen/issues/61?email_source=notifications&email_token=AAFTCS43YXSKZCGMQJ5KHT3QVZETPA5CNFSM4JRKKBD2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFJCETY#issuecomment-559030863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFTCS2ZUP3VT37KXBDEMQ3QVZETPANCNFSM4JRKKBDQ .

-- Peter Murray-Rust Founder ContentMine.org and Reader Emeritus in Molecular Informatics Dept. Of Chemistry, University of Cambridge, CB2 1EW, UK

MikeWilliams-UK commented 4 years ago

Essential Olis database is available at https://www.chem4word.co.uk/extra-compound-libraries/