Open tnaake opened 3 years ago
Now, that's a comprehensive list ;)
I would suggest to have these in a separate package (maybe MsQC
?), also because not all of the parameters can be calculated on a Spectra
: the ones based on XIC would require a Chromatograms
(which would be returned by e.g. xcms
) as they refer to the MS1 chromatographic peaks. Also others like the last one needs to extracted directly from the mzML file (and not sure that all manufacturers write/export this information). Also, having it in a separate package makes development easier - functionality could eventually be transferred if needed.
The other main question is: what would be the user interface you envision? One function for each QC parameter? Or one main function and define the which metric(s) to calculate with a parameter?
One possibility could be:
setMethod("quality", "Spectra", function(object, metric = qualityMetrics("Spectra")))
What the method returns depends a little on how the metric is calculated, if it's done on a single spectrum or on the whole Spectra
.
qualityMetrics
could be a function that lists all possible metrics that can be calculated/estimated on a Spectra
object.
just an idea...
Great, then let's go for a separate package. Should I create a repo in my repo and start with the implementation there? I guess I can start from next week on to write some functions for calculating (some of) the metrics.
We could also start first on the metrics based on Spectra
and Chromatograms
for now - and go into mzML
files later (there are also further metrics that could be calculated from raw/mzML
files which could be added later - if there's a need. I will also talk to the people in the core facilities here in which metrics calculated from "raw"-like files they might be interested in).
I like the idea of having one main function and define the metrics to calculate therein and have for Spectra
/Chromatograms
/... objects methods. This looks quite tidy and clean to me.
The output would be a list
(or a S4 object - tbd) containing the metrics for a Spectra
object or a Chromatograms
object, etc.
I would suggest you create a repo under your account - if you want you can eventually add me as external collaborator so that I can review your pull requests? It's sometimes not bad to get a second opinion on implementations...
Just FYI - there is (or was, as it may have been depreciated) an msQC
package in Bioc, so check for name clashes first.
Hi @lgatto
thanks for your comment. I checked now, if there is a msQC
package in BioC. It seems that there is mdqc
, miQC
, and msqc1
, but I couldn't find another msQC
package.
You should also always check if a package name could have an ambiguous meaning or might be offending - in your case I could only find MSQC = Missouri Start Quilt Company - so it should be fine ;)
sorry, my comment was not really helpful - I just found it funny when I stumbled across that abbreviation
Dear @jorainer
following up on the conversation in the slack channel, here comes the issue in the
Spectra
package.The idea was to be able to calculate HUPO-PSI-defined quality metrics (https://github.com/HUPO-PSI/mzQC/blob/master/cv/qc-cv.obo) on MS samples and possibly, for some of them, the
Spectra
package or infrastructure would be an ideal place (or aSpectraQC/...
package). The metrics could be applied on metabolomics and proteomics data. Not all metrics can be calculated based onSpectra
objects.I was thinking of the following, excessive list of, metrics (focusing on MS1, given are the ID, the value type, the name and definition if it differs from the name):
What do you think would be the best place to calculate these metrics (within
Spectra
or outside/in a stand-alone package)? Do you think there could be other objects that could complementSpectra
objects for the calculation when information stored in aSpectra
object is not suitable for the calculation, e.g.QFeatures
?Best, T.