Open jorainer opened 6 years ago
Ad data package: Yes I think you are definitely right that this is the way to go. I will do that when I have parsed HMDB (need to ask if I can be allowed to include it) and pubchem.
Ad the the S4 stuff. I am not against that at all. But I think I have to say that that is not priority for me as I have not yet dwelled into the whole S4-thing yet. So right now it will be some investment of time to learn that stuff and less easy for me to maintain. I would also not want to drop the tidyverse style code as I find this so much easier to work with. But I guess that can be done as you said. That said if you want to start on this I would be grateful and more than happy to change the structure. It is something I really want but I also need to priorities actually using the tool to do the projects I am supposed to be working on...
If it's OK for you I would fork your repo and add the code related to the S4 approach - it's easier to describe/discuss if you can really get your hands onto it.
Yes of course. I think it would be less messy if you hold on for now though. I am adding the last things I have. The peaklist annotation and the browser itself.
@jotsetung I added the last functions and added full example to the README. I don't plan to do much more right now. Perhaps get Pubchem and HMDB parsers done.
Cool! thanks! Could you please make a branch named e.g. S4? So, if I make a pull request at some time we don't mess with the master branch?
Sure! Done.
I need to catchup, nice work so quickly put together!
I am starting to think that it might make sense to lock it down to defining an S4 object that takes a CAMERA object and adds the annotation (or extends the CAMERA object to allow this). That way all the peaktable manipulation can be abstracted away. Downside is that you can then basically only use xcms/CAMERA (though I do have a function to fake a CAMERA object from an MzMine peaklist).
Supporting to perform the annotation on a CAMERA
object makes sense, but I would definitely keep low-level functions that don't required special S4
objects as input, so that it can be applied in general too.
@stanstrup, very nice work indeed! Always wanted to have such a packages and also started implementing something (https://github.com/jotsetung/xcmsExtensions), but never too serious.
My suggestion(s):
Keep functionality separate from the data: Have dedicated data packages. This allows to have data packages from different sources or from different versions. See e.g.
ensembldb
and theEnsDb.Hsapiens.v75
package, orGenomicFeatures
and the separateTxDb
packages.Define a
CompoundDb
class with main methods to query and access the database. E.g. have a methodcompound
that retrieves compounds from the database and supports multiple filters. I know that's a little different setup (Bioconductor's rich, S4-based) than thetidyverse
one, still, I think one could combine both worlds.One could then define e.g. a
HMDB
class that simply extends theCompoundDb
to accommodate HMDB-specific fields and attributes.What might also be interesting is to implement the filters (or create filters that extend)
AnnotationFilter
(e.g.MzFilter
orMassFilter
). TheMzFilter
would have to calculate themass
for the providedmz
. Ideal would be to have aMassFilter
that takes aMzFilter
as input, calculated the theoretical mass for themz
and returns aMassFilter
.The big advantages of having this setup would be:
AnnotationHub
.I would be happy to contribute here (especially related to the database class, interface methods and filters as I did all this already in
ensembldb
).open for discussion