sgibb / MALDIquant

Quantitative Analysis of Mass Spectrometry Data
https://strimmerlab.github.io/software/maldiquant/
60 stars 25 forks source link

Negative SNR #75

Closed zdens closed 2 years ago

zdens commented 2 years ago

Hi all, recently I faced with the problem while aligning mass-spectra obtained from our Thermo Fisher Orbitrap by means of MALDIquant R-package, version 1.21.

the problem is after peak detection with the method 'SuperSmoother' several peaks in the lists being obtained are with negative SNRs.

I think that the root of the problem lies in the method stats::supsmu which perform the cross validation for selecting best parameters and constructing the smoothed line.

There is the RDS-file (spectra_353.gz) with the problem spectra attached.

The code

spans <- seq(0.01, 0.5, 0.01)
res <- sapply(spans, function(span){
  y <- stats::supsmu(spectra_353@mass, spectra_353_@intensity, span = span)$y
  length(which(y < 0))
})

shows me that the negative SNR occurs after span becomes greater than 0.3 for this particular case, but probably for the other spectra the suitable span will have another value and stats::supsmu does not allow to change the number of spans for cross validation.

How can this problem be solved?

sgibb commented 2 years ago

I am sorry for the late reply!

I can confirm that the supsmu function returns negative values. To be honest MALDIquant was never designed for Orbitrap data (if it works for your case its great). Your data are more or less just peak shapes (no profile data, because there are no baseline artefacts, no equal spaced points between peaks, ...). I guess you want to create centroided data (just single peak values instead of peak shapes). While this not answering your question I am wondering if a filtering based on SNR is needed at all. I see a few different possible solutions:

  1. Use SNR = 0 and ignore the snr value in the resulting peaks object.
  2. Increase the halfWindowSize argument (for your example data there are just negative SNR values for halfWindowSize <= 7 (essentially removes the first and last little peaks), if you want noise estimation to filter lower intense peaks you may consider increasing the halfWindowSize anyway).
  3. Use a fixed span argument (and maybe run a cross validation for 3 different span values yourself before).
  4. Use "MAD" for noise estimation (which is very low as well, because there are no real noisy profile data to estimate anything but it would be positive).
  5. Suggest/Implement a different noise estimator that I/you could add/contribute.

I know this might not be the answer you were looking for because it is no real or easy solution.

I am wondering if I should implement a warning if the supsmu yield negative SNR values.

zdens commented 2 years ago

@sgibb, thank you for your reply!

the further analysis shows that in our case negative noise values are being placed only at the edges of the spectrum, so our solution now is to pad mz/intensity list with zero intensity points from left and right sides and thus to eliminate these boundary effects.

Interesting fact is that for this current case https://github.com/sgibb/MALDIquant/files/8574094/spectra_353.gz the negative noise values are vanished completely in case of padding with 100 points at every sides of the spectrum whereas in case of padding with 50 points the negative values are kept.

So the size of padding depends on the spectrum itself.