openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

Add new mappings to CAS #353

Open danidi opened 8 years ago

danidi commented 8 years ago

Currently, CAS numbers are only available from the IMS for compounds which are in the HMDB dataset. Maybe we should add additional linksets (e.g. Pubchem?) to add CAS numbers for all our molecules.

egonw commented 7 years ago

PubChem does not have CAS numbers annotated. And there are multiple databases that use the same format.

But keep in mind:

  1. formally, CAS strongly discourages CAS number data sets with more than 10k numbers
  2. we won't get CAS numbers for the few M compounds anyway
AlasdairGray commented 7 years ago

formally, CAS strongly discourages CAS number data sets with more than 10k numbers

How does that fit with the advice received from John Wilbanks about the reuse of identifiers?

egonw commented 7 years ago

To me that implies that we must not use CAS... we are not at this moment, but this question is suggesting we should support it... I would not give that priority unless people pay us to figure out things with Chemical Abstracts... (like the Wikimedia Foundation did for Wikipedia; note, they were not allowed to disclose all details of that deal, IIRC)

AlasdairGray commented 7 years ago

John's advice if I recall correctly was that identifiers were essentially public domain and could not be held under a license (we should dig out the document to check exactly). If CAS numbers are fulfilling this role, then does the same apply to them?

egonw commented 7 years ago

Well, Chemical Abstracts claims not... I like to think John's right on that advice, but the question is are we willing to knowingly ignore conditions set by Chemical Abstracts because we feel they won't hold up in court? Not a high priority to me...

danidi commented 7 years ago

I would not give this issue priority, given the restrictions of CAS. Is there an official statement from CAS available somewhere, we could refer the users to if there is a request to query with CAS numbers?

egonw commented 7 years ago

Some pointers (but CAS seems to have changed their wording recently!)

The current wording seems to disallow any database (https://www.cas.org/legal/infopolicy):

CAS does not permit the building of Databases that have wide and general availability
and no longer fulfill the purpose of individual or team research that CAS permits but
instead serve as a substitute for the use of CAS Databases.

It seems to come down to the question if they see Open PHACTS (or Wikipedia, or ...) as substitute...