Closed jorainer closed 2 years ago
I've implemented a modified version of the function that performs better on larger spectra:
.peaks_remove_fft_artifact2 <- function(x, halfWindowSize = 0.05,
threshold = 0.2,
keepIsotopes = TRUE,
maxCharge = 5,
isotopeTolerance = 0.005) {
neutron <- 1.0033548378 # really C12, C13 difference
iso_dist <- neutron / seq(from = 1, by = 1, to = maxCharge)
find_isotopes <- keepIsotopes & any(iso_dist < halfWindowSize)
mz <- x[, "mz"]
int <- x[, "intensity"]
idx <- order(int, decreasing = TRUE)
keep <- rep(TRUE, length(idx))
while (!is.na(i <- idx[1L])) {
idx <- idx[-1L]
rem_candidate <- which(
between(mz[idx], mz[i] + c(-halfWindowSize, halfWindowSize)))
rem_candidate <- rem_candidate[int[idx][rem_candidate] / int[i] <
threshold]
## That part might be improved later.
if (find_isotopes) {
target_dist <- abs(mz[idx][rem_candidate] - mz[i])
dist_matrix <- outer(target_dist, iso_dist, "-")
dist_matrix <- abs(dist_matrix)
dist_to_iso_hypo <- apply(dist_matrix, 1, min)
rem_candidate <- rem_candidate[dist_to_iso_hypo > isotopeTolerance]
}
if (length(rem_candidate)) {
keep[idx[rem_candidate]] <- FALSE
idx <- idx[-rem_candidate]
}
}
x[keep, , drop = FALSE]
}
This function considers in each loop only peaks that were not already removed or also tested - thus, it gets faster with each iteration:
data(fft_spectrum)
library(microbenchmark)
sp <- peaksData(fft_spectrum)[[1L]]
microbenchmark(Spectra:::.peaks_remove_fft_artifact(sp),
.peaks_remove_fft_artifact2(sp))
Unit: milliseconds
expr min lq mean median
Spectra:::.peaks_remove_fft_artifact(sp) 4.7659 5.57925 6.451996 5.80830
.peaks_remove_fft_artifact2(sp) 2.7591 3.19880 3.920461 3.38825
uq max neval cld
6.00455 16.5911 100 b
3.53515 18.4121 100 a
sciex_file <- normalizePath(
dir(system.file("sciex", package = "msdata"), full.names = TRUE))
sciex_mzr <- backendInitialize(MsBackendMzR(), files = sciex_file)
sp <- peaksData(sciex_mzr)[[1705L]]
microbenchmark(Spectra:::.peaks_remove_fft_artifact(sp),
.peaks_remove_fft_artifact2(sp))
Unit: milliseconds
expr min lq mean
Spectra:::.peaks_remove_fft_artifact(sp) 193.9958 205.78610 223.92499
.peaks_remove_fft_artifact2(sp) 43.3130 46.91455 49.58035
median uq max neval cld
213.67975 220.5217 483.4687 100 b
48.39205 51.9478 61.1351 100 a
@stanstrup, can you please have a look at this PR to see if it's OK from your side?
Looks good :) Wouldn't it make sense to set ylim so you clearly see the issue?
## Removing fourier transform artefacts seen in Orbitra data.
## Loading an Orbitrap spectrum with artefacts.
data(fft_spectrum)
length(mz(fft_spectrum)[[1]])
plotSpectra(fft_spectrum, xlim = c(264.5, 265.5))
plotSpectra(fft_spectrum, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
fft_spectrum_filtered <- filterFourierTransformArtefacts(fft_spectrum)
fft_spectrum_filtered
length(mz(fft_spectrum_filtered)[[1]])
plotSpectra(fft_spectrum_filtered, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
# Using a few examples peaks in your data you can optimize the parameters
fft_spectrum_filtered <- filterFourierTransformArtefacts(fft_spectrum,
halfWindowSize = 0.2,
threshold = 0.005,
keepIsotopes = TRUE,
maxCharge = 5,
isotopeTolerance = 0.005
)
fft_spectrum_filtered
length(mz(fft_spectrum_filtered)[[1]])
plotSpectra(fft_spectrum_filtered, xlim = c(264.5, 265.5), ylim = c(0, 5e6))
Something seems wrong in the logging:
Remove fast fourier artefacts within. [Tue Nov 23 10:47:49 2021]
Looks good.
Looks good to me. I would nevertheless find it useful to read a bit more about that artefact, where it comes from, how to detect if, or a reference to read more about it.
I have never seen it described in the literature even if it is relatively well-known. What I know is here: https://github.com/sneumann/xcms/issues/94#issuecomment-336829325
I think it's great if this is indeed explicitly described (in written, with code) for the first time in the software, which begs for a description of the artefact. That short paragraph is a great start!
I don't know whether it should come under the details section of the Spectra man page, as part of the description of the function, or whether it warrants its own man page with details.
That's an excellent idea. I think a dedicated, separate, man page for this function might be a good location for this information. So we have in the Spectra
man page the description we have currently, but point from there to the more detailed man page.
Also, I think we can/should mention that this would be a first ad-hoc implementation and maybe more sophisticated versions will follow (if needed).
I added a separate documentation file/man page with commit https://github.com/rformassspectrometry/Spectra/commit/93bae68d248c7073b8a865d6144136ea7350a832
Great! I also just realise that this new man page could possibly also be merge with the fft_spectrum manual.
Makes sense - I'll combine both.
Only problem we have now is that unit tests fail because of mzR
.
Just a note. I don't know where I got the ~0.01 Da from. In the example we use the problem is visible as wide as +/- 0.2 Da.
What would be very cool would be to write a small paper on this, but in dialog with someone at Thermo that understands the physics behind this. We would also need continuous mode data to really see what the phenomenon looks like. This is probably to some extend a centroiding issue. From what I heard newer instruments are less affected by this.
@sgibb , I have now included all your suggestions. Please have a final look. Once you give me your OK I will merge.
This PR adds the code from @stanstrup to remove Orbitrap shoulders/fourier transform artefacts from spectra.