rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
37 stars 25 forks source link

feat: add filterFourierTransformArtefacts function (issue #223) #225

Closed jorainer closed 2 years ago

jorainer commented 2 years ago

This PR adds the code from @stanstrup to remove Orbitrap shoulders/fourier transform artefacts from spectra.

jorainer commented 2 years ago

I've implemented a modified version of the function that performs better on larger spectra:

.peaks_remove_fft_artifact2 <- function(x, halfWindowSize = 0.05,
                                        threshold = 0.2,
                                        keepIsotopes = TRUE,
                                        maxCharge = 5,
                                        isotopeTolerance = 0.005) {
    neutron   <- 1.0033548378 # really C12, C13 difference
    iso_dist  <- neutron / seq(from = 1, by = 1, to = maxCharge)
    find_isotopes <- keepIsotopes & any(iso_dist < halfWindowSize)
    mz <- x[, "mz"]
    int <- x[, "intensity"]
    idx <- order(int, decreasing = TRUE)
    keep <- rep(TRUE, length(idx))
    while (!is.na(i <- idx[1L])) {
        idx <- idx[-1L]
        rem_candidate <- which(
            between(mz[idx], mz[i] + c(-halfWindowSize, halfWindowSize)))
        rem_candidate <- rem_candidate[int[idx][rem_candidate] / int[i] <
                                       threshold]
        ## That part might be improved later.
        if (find_isotopes) {
            target_dist <- abs(mz[idx][rem_candidate] - mz[i])
            dist_matrix <- outer(target_dist, iso_dist, "-")
            dist_matrix <- abs(dist_matrix)
            dist_to_iso_hypo <- apply(dist_matrix, 1, min)
            rem_candidate <- rem_candidate[dist_to_iso_hypo > isotopeTolerance]
        }
        if (length(rem_candidate)) {
            keep[idx[rem_candidate]] <- FALSE
            idx <- idx[-rem_candidate]
        }
    }
    x[keep, , drop = FALSE]
}

This function considers in each loop only peaks that were not already removed or also tested - thus, it gets faster with each iteration:

data(fft_spectrum)
library(microbenchmark)
sp <- peaksData(fft_spectrum)[[1L]]
microbenchmark(Spectra:::.peaks_remove_fft_artifact(sp),
               .peaks_remove_fft_artifact2(sp))
Unit: milliseconds
                                     expr    min      lq     mean  median
 Spectra:::.peaks_remove_fft_artifact(sp) 4.7659 5.57925 6.451996 5.80830
          .peaks_remove_fft_artifact2(sp) 2.7591 3.19880 3.920461 3.38825
      uq     max neval cld
 6.00455 16.5911   100   b
 3.53515 18.4121   100  a 

sciex_file <- normalizePath(
    dir(system.file("sciex", package = "msdata"), full.names = TRUE))
sciex_mzr <- backendInitialize(MsBackendMzR(), files = sciex_file)
sp <- peaksData(sciex_mzr)[[1705L]]
microbenchmark(Spectra:::.peaks_remove_fft_artifact(sp),
               .peaks_remove_fft_artifact2(sp))
Unit: milliseconds
                                     expr      min        lq      mean
 Spectra:::.peaks_remove_fft_artifact(sp) 193.9958 205.78610 223.92499
          .peaks_remove_fft_artifact2(sp)  43.3130  46.91455  49.58035
    median       uq      max neval cld
 213.67975 220.5217 483.4687   100   b
  48.39205  51.9478  61.1351   100  a 
jorainer commented 2 years ago

@stanstrup, can you please have a look at this PR to see if it's OK from your side?

stanstrup commented 2 years ago

Looks good :) Wouldn't it make sense to set ylim so you clearly see the issue?

## Removing fourier transform artefacts seen in Orbitra data.

## Loading an Orbitrap spectrum with artefacts.
data(fft_spectrum)
length(mz(fft_spectrum)[[1]])
plotSpectra(fft_spectrum, xlim = c(264.5, 265.5))

plotSpectra(fft_spectrum, xlim = c(264.5, 265.5), ylim = c(0, 5e6))

fft_spectrum_filtered <- filterFourierTransformArtefacts(fft_spectrum)

fft_spectrum_filtered
length(mz(fft_spectrum_filtered)[[1]])
plotSpectra(fft_spectrum_filtered, xlim = c(264.5, 265.5), ylim = c(0, 5e6))

# Using a few examples peaks in your data you can optimize the parameters
fft_spectrum_filtered <- filterFourierTransformArtefacts(fft_spectrum,
                                                halfWindowSize = 0.2,
                                                  threshold = 0.005,
                                                  keepIsotopes = TRUE,
                                                  maxCharge = 5,
                                                  isotopeTolerance = 0.005
                                                  )

fft_spectrum_filtered
length(mz(fft_spectrum_filtered)[[1]])
plotSpectra(fft_spectrum_filtered, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
stanstrup commented 2 years ago

Something seems wrong in the logging: Remove fast fourier artefacts within. [Tue Nov 23 10:47:49 2021]

stanstrup commented 2 years ago

Looks good.

stanstrup commented 2 years ago

Looks good to me. I would nevertheless find it useful to read a bit more about that artefact, where it comes from, how to detect if, or a reference to read more about it.

I have never seen it described in the literature even if it is relatively well-known. What I know is here: https://github.com/sneumann/xcms/issues/94#issuecomment-336829325

lgatto commented 2 years ago

I think it's great if this is indeed explicitly described (in written, with code) for the first time in the software, which begs for a description of the artefact. That short paragraph is a great start!

I don't know whether it should come under the details section of the Spectra man page, as part of the description of the function, or whether it warrants its own man page with details.

jorainer commented 2 years ago

That's an excellent idea. I think a dedicated, separate, man page for this function might be a good location for this information. So we have in the Spectra man page the description we have currently, but point from there to the more detailed man page.

Also, I think we can/should mention that this would be a first ad-hoc implementation and maybe more sophisticated versions will follow (if needed).

jorainer commented 2 years ago

I added a separate documentation file/man page with commit https://github.com/rformassspectrometry/Spectra/commit/93bae68d248c7073b8a865d6144136ea7350a832

lgatto commented 2 years ago

Great! I also just realise that this new man page could possibly also be merge with the fft_spectrum manual.

jorainer commented 2 years ago

Makes sense - I'll combine both.

jorainer commented 2 years ago

Only problem we have now is that unit tests fail because of mzR.

stanstrup commented 2 years ago

Just a note. I don't know where I got the ~0.01 Da from. In the example we use the problem is visible as wide as +/- 0.2 Da.

What would be very cool would be to write a small paper on this, but in dialog with someone at Thermo that understands the physics behind this. We would also need continuous mode data to really see what the phenomenon looks like. This is probably to some extend a centroiding issue. From what I heard newer instruments are less affected by this.

jorainer commented 2 years ago

@sgibb , I have now included all your suggestions. Please have a final look. Once you give me your OK I will merge.