Open jorainer opened 4 years ago
I would vote for number 3: xcmsFeatures
.
It would be useful to have a look at the package.
The package is not there yet - I've implemented the respective functionality here at the moment: https://github.com/EuracBiomedicalResearch/CompMetaboTools/blob/master/R/group_feature_methods.R
We could also make a dev call, but if so, it has to be either this week or end of July.
Thanks - so this is what you showed me some time ago during a call, isn't it?
Features
at all (for now at least). xcms
because I also see an application in proteomics.xcmsFeatures
(see above and start with a lower case ;-), and only lukewarm with MetabFeatures
(see above). What about MSFeatures
? And may be the current Features
could be renamed QFeatures
(for Quantitative/Quantitation), to keep away from the ambiguous features term. Features
?Thanks - so this is what you showed me some time ago during a call, isn't it?
yes, exactly.
- It doesn't look it fits in Features at all (for now at least).
That's how I also see it at the moment - but its results should then be exported/converted to a Features
object.
- I wouldn't mind a call, but not this week.
Maybe anyway better to do a call once I've also finished a vignette.
- I would prefer to not put in in xcms because I also see an application in proteomics.
OK for me.
Regarding the name, I agree, Metabo
might not be the correct thing here - actually, the most appropriate name for the features we're dealing with would be LC-MS features - but LCMSFeatures
looks terrible. I would be OK with MsFeatures
.
- Are there any plans to eventually convert the mass spec features into quantitative tables using Features?
Absolutely. The plan is xcms
-> ??Features
(do feature grouping based on LC-MS properties) -> Features
(do some further grouping based on difference in m/z, or MS2 spectra or ...). I see the ??Features
more as a package that depends on Features
, reuses it's classes and functionality but adds additional functionality that we need for untargeted LC-MS(/MS) (which might also, at least partially, be interesting for proteomics).
It looks like we are on the same page. Let's schedule a call after our respective holidays.
I am not exactly following @lgatto's reason for not liking xcmsFeatures and how it relates to proteomics, but that is probably my ignorance of proteomics.
But to me it is also illogical. An "xcms Features", i.e. thought of as either a peak in two dimensions (as we generally do in metabolomics, right?) or as a feature generated by the package xcms - wouldn't that be simply a feature?
If the package is about grouping features/"Features" shouldn't that be reflected in the name? So... wouldn't FeatureGroups or something similar feel more natural?
I am curious about the motivation for re-implemention this from scratch? The functionality is very much like CAMERA right? Looking through your documentation it was not clear to be how the correlation network is cut in the end? It does the refining of the groups one correlation at a time right? as opposed to CAMERA that adds the correlations.
If I was emperor with supreme powers I would suggest a package that wraps the now several packages that do something similar with a unified API. I guess I will wait for my infinite emperor grant to come through.
Re package name, in RforMassSpec packages all follow the same naming conventions and start with a capital letter, that's also one reason against xcmsFeatures
. Also I would prefer a MsFeatures
over an FeatureGroups
because the former is a little more generic. The package does something with MS features - grouping them might just be one thing.
Re CAMERA
, yes, this is somewhat re-implementing part of its functionality. Honestly, even by looking through the code of CAMERA
I did not exactly get how it performs all the correlations and grouping and how you can control that. This was when I then decided I wanted to re-implement it and split the functionality into different calls that can be all called separately or combined in any order (and the core functions are also independent of any class, so they could be re-used by other packages or for other stuff). I base all correlation groupings on a complete pairwise correlation matrix between all members (features) of a group. Then I start with the pair with the highest correlation put them into a group, and iteratively walk through all pairwise correlation (ordered by correlation coefficient) putting them into an existing group if their correlation is higher than the threshold. This approach creates feature groups in which all members have a correlation > threshold to each other. Here it would be nice to get your input and knowledge of CAMERA
- maybe there's something better implemented that I have overseen.
And yes, there are many packages now around but I found most (all?) of them quite unusable, because they use their own class which only exists in this one package - and many packages even don't seem to be actively maintained/updated. So, the wrapper package would most of the time have to translate from one object to another and maintainance of this package would be a nightmare. My ideal approach would be to invite all these package maintainers to provide their core functionality and we put all of this in one core package. Something we did with the spectra similarity calculations in the Spectra
package. Maybe something for the next metaRbolomics hackathon?
quite some time ago after developing ramclustR i was interested in moving some of the approaches from it into CAMERA. This package, if it is reimplementing CAMERA for feature grouping, could be an opportunity to reinvest in that effort. @stanstrup @jorainer - i would like to work towards this if you think it suitable. There are a few differences (by my understanding) between the ramclustR approach and CAMERA.
I am sure this is oversimplifying at least a bit, but this package will enable us to focus on the common goals of the two packages, and return a common data structure which better integrates with the rest of the R mass spec package family, which i would be excited to try to help with.
This sounds really great Corey! I would love to integrate ramclustR with the MsFeatures
package. The idea of MsFeatures
is pretty simple, it takes an input object (XCMSnExp
or SummarizedExperiment
) and groups the features in it defining a character
vector which represents the grouping (length of the vector is the same as there are features), grouped features will have the same feature group ID.
Limitations:
Advantages:
If we join forces I think we should be ablel to add that functionality preatty easily... I hope. Can you point me to some code/documentation I could start digging into ramclustR to better understand it?
https://github.com/cbroeckl/RAMClustR/blob/master/R/rc.ramclustr.R
ramclustR adheres to your description - assignment is binary. the premise is to calculate the similarity matrix for all features, based on retention time similarity and quantitative similarity over the sample set (pearson's r). I didn't use peak shape initially, mostly due to a lack of skill in extracting that many EICs, but also because i was also ramclustR for grouping DIA fragment ions, derived from MSe data, which would have required a bit more code to properly adjust to align with the MS1. to manage memory, the full similarity matrix is calculated in (generally) 2000 feature chunks and the data is stored in an ff object. i explored sparse matrix implementations in the past, but that may be worth revisiting.
After you have a full similarity matrix, you cluster using HCA. memory is again a bit of a concern here, and it would good to look into the latest and greatest for reducing memory burden.
the dynamicTreeCut package is used to then cut the full HCA dendrogram into clusters, ideally representing a compound per cluster. The output of this is a numeric vector of length = n.features. I generally then sign a character name string to each compound as well: cluster #1 becomes 'C0001' etc.
happy to help, of course, just let me know when you plan to start working in this area. I don't know if my code is intelligible or not ;-).
Thanks for the update! And sorry for my ignorance, but then it sounds like ramclustR groups based only on the MS1 properties retention time and quantified feature value and not considering also MS2 spectra? Is that correct?
Correct. the algorithm does not use MS/MS data at all to assign clusters. The only properties are retention time and quantified feature values.
That said, it was originally developed with MSe in mind, and we can perform centWave signal detection on the MSe data, where MSe is full mass range MS/MS - i.e. the least selective form of DIA imaginable. Since MSe samples all precursors as frequently as we have MS1 level scans, we can use CentWave on MSe (MS2) chromatograms as well. If that has been done, RAMClustR can also cluster MS2 fragment ions to generate reconstructed MS2 spectra from MSe data.
In the MSe case, XCMS is performed on MS1 and MS2 data for each injection, the full xcms set is aligned and ramclustr splits MS1 from MS2 data. MS1 data is used for quantitative signal intensity, the MS2 data are used for annotation. MSe data is not required for clustering, but if you have MSe data processed with the MS1 data, we can use both to improve annotation.
It is an open question as to whether we should build that part (MSe, AIF) into this package.
We've implemented some functionality that allows to group LC-MS features (i.e. defined by their m/z and retention time range) into feature group where ideally each feature group collects all features that come from the same original compound (i.e. are adducts or isotopes ... of that). The question now is what should be the home of this functionality. Currently 3 main grouping functions are available:
The LC-MS feature grouping functions above re-use functionality from
xcms
(especially for the correlation based on peak shape of the EIC).The general workflow would be the following:
xcms
(peak picking, feature definitions) -> ??? (initial feature grouping based on properties of the LC-MS) ->Features
(optional further feature grouping, independent of LC-MS).The question now is where should be put the functionality. Possible options are:
1) put it into
xcms
2) put it intoFeatures
3) put it into a new packageMetaboFeatures
orxcmsFeatures
.Would be nice to get some feedback @lgatto @sneumann @stanstrup @michaelwitting @sgibb