stanstrup / PeakABro

Peaklist Annotator and Browser
8 stars 1 forks source link

Add generate_hmdb_tbl #10

Open jorainer opened 6 years ago

jorainer commented 6 years ago
stanstrup commented 6 years ago

Is there an advantage to parsing the XML instead of the SDF? chemmineR::datablock2ma is very convenient for getting all the info.

jorainer commented 6 years ago

Reason was that I had a script to retrieve all compounds individually from HMDB, because the release files were not really up-to-date. If you query them online (e.g. http://www.hmdb.ca/metabolites/HMDB0000001.xml) you get the xml, that's basically why.

jorainer commented 6 years ago

After implementing the SDF parse function too one advantage of the xml parsing is speed. But in the end it's good to have both in place. Thanks for the suggestion!

jorainer commented 6 years ago

OK, hmdb SDF support is in. generate_hmdb_tbl can now be used with file being the file name of a HMDB file either in xml or SDF format.

jorainer commented 6 years ago

I did also add some first code to write the tbl into a SQLite database (including metadata). Next things will be: 1) Implement the CompoundDb object, that can be used to interface the database. 2) Implement the code to create an R package containing the annotation. 3) Implement all required methods to use the CompoundDb. The simplest one will be to extract all data in the form of a tbl so it can be used straight using your code.

jorainer commented 6 years ago

Right, no need to make the pull request right now - better to wait, but good that you start looking at the code, otherwise it will be too much to look at ;)

jorainer commented 6 years ago

OK, now I have all of the core stuff in place:

Have a look at it @stanstrup and let me know if it's OK or if you'd like changes.

stanstrup commented 6 years ago

Thanks! This is awesome. And gulp! There is a lot to look at. It might take me some days.

Just a few Qs for now 1) Is it better to have one package for each DB rather than one with a collection? 2) Are you sure on the HMDB license that you can put up the db? On the website it is indicated that you need permission. I have contacted them to hear what we can do.

jorainer commented 6 years ago

Yes, sorry that I added so much ;) - I just wanted to make sure it is at a stage where it might be useful. And for the largest part it's documentation, comments and unit tests.

Re Qs: 1) I think yes. Reasons to keep the resources separate are:

2) No, I'm pretty sure that's not the correct license, but as long as the data is not used for commercial use it should be OK (they state:

Use and re-distribution of the data, in whole or in part, for commercial purposes requires explicit permission of the authors and explicit acknowledgment of the source material (HMDB) and the original publication (see below) Once they reply I have to fix the license. In the end we will probably place a license file specific for each annotation resource into the annotation package.