rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Calculation of quality metrics based on HUPO-PSI/mzQC definitions #204

Open tnaake opened 3 years ago

tnaake commented 3 years ago

Dear @jorainer

following up on the conversation in the slack channel, here comes the issue in the Spectra package.

The idea was to be able to calculate HUPO-PSI-defined quality metrics (https://github.com/HUPO-PSI/mzQC/blob/master/cv/qc-cv.obo) on MS samples and possibly, for some of them, the Spectra package or infrastructure would be an ideal place (or a SpectraQC/... package). The metrics could be applied on metabolomics and proteomics data. Not all metrics can be calculated based on Spectra objects.

I was thinking of the following, excessive list of, metrics (focusing on MS1, given are the ID, the value type, the name and definition if it differs from the name):

What do you think would be the best place to calculate these metrics (within Spectra or outside/in a stand-alone package)? Do you think there could be other objects that could complement Spectra objects for the calculation when information stored in a Spectra object is not suitable for the calculation, e.g. QFeatures?

Best, T.

jorainer commented 3 years ago

Now, that's a comprehensive list ;)

I would suggest to have these in a separate package (maybe MsQC?), also because not all of the parameters can be calculated on a Spectra: the ones based on XIC would require a Chromatograms (which would be returned by e.g. xcms) as they refer to the MS1 chromatographic peaks. Also others like the last one needs to extracted directly from the mzML file (and not sure that all manufacturers write/export this information). Also, having it in a separate package makes development easier - functionality could eventually be transferred if needed.

The other main question is: what would be the user interface you envision? One function for each QC parameter? Or one main function and define the which metric(s) to calculate with a parameter?

One possibility could be:

setMethod("quality", "Spectra", function(object, metric = qualityMetrics("Spectra")))

What the method returns depends a little on how the metric is calculated, if it's done on a single spectrum or on the whole Spectra.

qualityMetrics could be a function that lists all possible metrics that can be calculated/estimated on a Spectra object.

just an idea...

tnaake commented 3 years ago

Great, then let's go for a separate package. Should I create a repo in my repo and start with the implementation there? I guess I can start from next week on to write some functions for calculating (some of) the metrics.

We could also start first on the metrics based on Spectra and Chromatograms for now - and go into mzML files later (there are also further metrics that could be calculated from raw/mzML files which could be added later - if there's a need. I will also talk to the people in the core facilities here in which metrics calculated from "raw"-like files they might be interested in).

I like the idea of having one main function and define the metrics to calculate therein and have for Spectra/Chromatograms/... objects methods. This looks quite tidy and clean to me.

The output would be a list (or a S4 object - tbd) containing the metrics for a Spectra object or a Chromatograms object, etc.

jorainer commented 3 years ago

I would suggest you create a repo under your account - if you want you can eventually add me as external collaborator so that I can review your pull requests? It's sometimes not bad to get a second opinion on implementations...

lgatto commented 2 years ago

Just FYI - there is (or was, as it may have been depreciated) an msQC package in Bioc, so check for name clashes first.

tnaake commented 2 years ago

Hi @lgatto

thanks for your comment. I checked now, if there is a msQC package in BioC. It seems that there is mdqc, miQC, and msqc1, but I couldn't find another msQC package.

jorainer commented 2 years ago

You should also always check if a package name could have an ambiguous meaning or might be offending - in your case I could only find MSQC = Missouri Start Quilt Company - so it should be fine ;)

jorainer commented 2 years ago

sorry, my comment was not really helpful - I just found it funny when I stumbled across that abbreviation

lgatto commented 2 years ago

Beware of MSQC of CRAN. Package names aren't case sensitive, so that one is taken.

And the one I was thinking about is proteoQC, that is now deprecated, so also taken.