rformassspectrometry / CompoundDb

Creating and using (chemical) compound databases
https://rformassspectrometry.github.io/CompoundDb/index.html
17 stars 16 forks source link

License issues #4

Open stanstrup opened 7 years ago

stanstrup commented 7 years ago

From @stanstrup on October 19, 2017 9:1

  1. Which databases can I include data from?
  2. If there are ones I cannot they will need to be download and table generated by the user. Is there such a thing as "in-package cache"?
  3. Which license can the package have if it includes db data?
  4. Is license a concern at all? As far as I know data cannot be copyrighted so is there any concern at all?

The info I extract is: id, name, inchi, formula, and mass..

For the moment I force-removed the files until this is settled.

Copied from original issue: stanstrup/PeakABro#1

stanstrup commented 7 years ago

From @egonw on October 19, 2017 13:31

I don't think LipidMaps is Open Data. Wikidata is, PubChem is.

stanstrup commented 7 years ago

From @chasemc on October 19, 2017 22:45

"LMSD lipid structures are deposited into PubChem database (http://pubchem.ncbi.nlm.nih.gov/) periodically and a link to PubChem substance ID (SID) is also maintained within LMSD. Access to complete set of LMSD lipid structures in PubChem is available at www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=search&DB=pcsubstance&term=LipidMAPS[sourcename])."

stanstrup commented 7 years ago

@chasemc thanks! That is very useful info. So I might be able to get around that one by just including PubChem and leave the indicator to lipidmaps so that you can eventually filter for the lipidmaps compounds.

stanstrup commented 7 years ago

@chasemc It seems the source is only in the SID entries. Not the CIDs. However the lipidmaps ids have been added as a name so it is possible to filter by those prefixes.

stanstrup commented 7 years ago

From @egonw on October 22, 2017 9:20

@chasemc also note that PubChem is not formally Open Data: it mixes their own public domain data with copyrighted upstream material. Legally, this is quite hard to untangle.

Generally, just contact LipidMaps and ask if it is OK to index their structures in the table as you want to do, and if you are allowed to make that available under terms compatible with the license of the R package.

For LipidMaps, a subset of about 1400 lipids is available under CCZero from Wikidata: http://tinyurl.com/ycbm9gfq

stanstrup commented 7 years ago

Thanks! I already contacted LipidMaps. Waiting for an answer.

stanstrup commented 7 years ago

@egonw what do you mean by upstream material? All the calculated properties? Si if I only use basic info as name and inchi it should be ok?