rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

filterMzValues #274

Closed mar-garcia closed 1 year ago

mar-garcia commented 1 year ago

Hi!!!

I open this issue since I've noticed that the filterMzValues function is not working as I expected... Here an example:

I have a MS2 spectra with some noise. These are its m/z values: > mz(myms2)[[1]] [1] 52.34714 52.34981 52.35789 52.36570 52.36932 52.37316 52.41703 67.24783 120.08054

I want to delete all m/z values around 52, therefore I run the code in the following way: > myms2filtered <- filterMzValues(myms2, 52.3, tolerance = 0.3, keep = FALSE)

However, when I check the m/z values of the new object I realize that the algorithm only excluded one of them.... (the first one, even if I was thinking that with that code I was able to delete also the first 7...) > mz(myms2filtered)[[1]] [1] 52.34981 52.35789 52.36570 52.36932 52.37316 52.41703 67.24783 120.08054

Is it possible to delete "in one shot" all the first 7 m/z values, that is 52.3 +- 0.3?

Thanks!!! :)

jorainer commented 1 year ago

Thanks for reporting Mar - yes, it makes sense to delete/remove all peaks with matching m/z and not just the best matching (which I guess is what is happening). I'll look into it.

jorainer commented 1 year ago

Indeed, since we're using closest(input_mz, spectra_mz) in the code (with input_mz being the m/z value(s) defined by the user and spectra_mz the m/z values of a spectrum) only a single peak will be identified for each input m/z in a spectrum. To find all peaks in the spectrum that mach any of the input m/z values (given ppm and tolerance) the order of the parameters has to be reversed, i.e. closest(spectra_mz, input_mz). This will have an impact on the performance (slower), but will deliver the expected results.

mar-garcia commented 1 year ago

Thanks @jorainer!!! And what about using the function between() instead of closest()? Could be an option? Or using this one (i.e., between()) will be even slower than using closest(spectra_mz, input_mz)?

jorainer commented 1 year ago

Note that you could use the filterMzRange that in fact uses the between function. A solution on between would be difficult/slow if the input parameter mz is of length > 1.

jorainer commented 1 year ago

Can you please install the fix and let me know if it works?

BiocManager::install("RforMassSpectrometry/Spectra", ref = "RELEASE_3_16")
mar-garcia commented 1 year ago

ah!! Then I understood wrongly how filterMzRange() is working.... Then I guess that it is possible to use this function also with the argument keep = FALSE, isn't it? I can try both filterMzRange() and also install the fix and see what's happening! I'll let you know.

jorainer commented 1 year ago

So, filterMzRange keeps all peaks that are within the provided upper and lower m/z value (can only be a single range). At present it does not have a parameter keep - but thinking that over it might actually make sense, so basically allowing to either keep all peaks within the range or to remove them...

jorainer commented 1 year ago

I'll quickly add the parameter keep also to filterMzRange then you can test it :)

jorainer commented 1 year ago

You would need to install the current devel branch to test the new filterMzRange with the keep parameter.

BiocManager::install("RforMassSpectrometry/ProtGenerics")
BiocManager::install("RforMassSpectrometry/MsCoreUtils")
BiocManager::install("RforMassSpectrometry/Spectra")
mar-garcia commented 1 year ago

Super! Now it's working :) By now I think that I'll use the function filterMzValues() with an specific m/z value + tolerance, but if in the future I need to work with something faster (maybe when I'll work with a higher amount of data) I'll try to use the function filterMzRange() with an specific m/z range. Thanks a lot!!