Open jorainer opened 11 months ago
I don't have strong opinions on it, honestly! I agree that chromPeaksQuality
is too ambitious unless this is a spot we want to allow others to calculate additional metrics from the raw data and the function is expected to grow significantly. I do think the "beta" nomenclature I've been using is more for internal use and don't believe the average user needs to know that it's being fit to a beta distribution. It does still fit to an "idealized" peak so maybe something like idealPeakComparison
or simplePeakTest
could be descriptive. It can also be used to replace the existing sn
and egauss
metrics so maybe snWithinPlusPeakCor
could also be helpful but is a little dense. If I had to pick one on the spot I'd probably go with something like peakShapeQualityCalc
because the metrics were designed to measure peak "shapeliness".
Agree - and I like your suggested name - maybe slightly reformulated into chromPeakshapeQuality
? To clarify that this is calculated on chromPeaks
(with defined rtmin rtmax and calculating the peak shape quality of the signal of the chromPeak)?
I like it! Sounds good to me.
Alignment with the mzQC folks might be nice. https://github.com/HUPO-PSI/mzQC
What about a rather generic peakQuality
function, and parameters that specify what is calculated, i.e. beta, egauss, ...
Yours, Steffen
that's obviously the better approach - maybe have a generic chromPeakQuality
method and again our infamous Param
parameter classes to define which quality metric to return. haven't found (well just had a quick look) a metric in mzQC that would fit the one defined by @wkumler .
A generic function that returns the metric of choice would be great. I currently have William's function qscoreCalculator
implemented in my script for targeted data analysis, but it is still super barebones and extracts targeted rt and int data in a loop, so I need to vectorize and improve my code still...
snippet (data is an MsExperiment
object):
chromatograms <- chromatogram(data, rt = rtRanges[j, ], mz = mzRanges[j, ])
rt <- chromatograms@.Data[[i]]@rtime
int <- chromatograms@.Data[[i]]@intensity
To avoid adding too many functions (also thinking of the future) maybe good to add a chromPeakSummary
method. This method should calculate a summary for each chrom peak. A param
parameter would then allow to define which summary should be calculated. Examples could be:
chromPeakSummary(xmse, BasicStats())
: calculate basic summary statistics for each peak, with the number of data points, the min, max, median and mean intensity. Maybe even something like variation of m/z values.chromPeakSummary(xmse, PeakShapeQuality())
: to calculate @wkumler 's scores.similar to all other chromPeak...
methods we can have a parameter peaks
that allows to provide the IDs of chrom peaks if the metric should only be calculated for selected chrom peaks.
Hi @jorainer
if I understand correctly what you want to do then there are several metrics defined by the PSI working groups. Have a look
e.g. at QC:4000074
, QC:4000075
, QC:4000076
in QC-cv.obo, or MS:4000050
, MS:4000051
in PSI-MS.obo.
Had a look through the obo. The only actual quality metric of an EIC (or XIC as they are called in the obo) is the FWHM (full width at half maximum, MS:1000086). The obo related obo terms are MS:4000017 (chromatogram metric) or more specific MS:4000018 (XIC quality metric).
Yeah, we struggled to find a lot of standardized "peak quality" definitions in the literature when working on the original project as well. The Kantz 2019 paper uses six quality metrics and a bunch of combinations of them (peak duration, height, area, FWHM, tailing factor, and asymmetry factor). Your 2022 CPC paper @jorainer has some of these implemented already (looks like everything except asymmetry factor, though the noise estimation is likely different). We used the outputs from XCMS (mz, rt, peakwidth, area, sn, f, scale, lmin) but didn't test on the additional metrics of verboseColumns. I do think it's worth calculating an m/z deviation (and maybe an m/z deviation from mean m/z ~ intensity) metric even though that didn't show up especially strongly in my dataset, and I also think that a metric for the "number of missing scans" would be really nice to have, though again my custom implementation wasn't especially powerful in my dataset.
Recently (PR #685), new quality score metrics can be calculated during centWave peak detection. Would be good to have also a function that allows calculation of these scores on already detected chrom peaks (i.e. after peak detection) or also directly on EICs.
While straight forward to implement, naming is again an issue. @wkumler do you have a suggestion/appropriate name for your new peak quality metrics we could use? Using
chromPeaksQuality
as function name might be a little too generic maybe.