rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
38 stars 25 forks source link

Introducing `filterRanges` Function and Discussion on a possible `filterSimilarities` #312

Closed philouail closed 8 months ago

philouail commented 9 months ago

Hi, I wanted to suggest the implementation of this function filterRanges, which allows the user to filter a spectra object based on any variables of its spectraData suing a range of values that the user inputs . I believe this would be great for multiple reasons:

Welcoming any feedback on the code and PR :)

On the other hand, I would also like to suggest implementing a similar function such as filterSimilarities (not sure about the name), which would be implemented in a similar way for filtering based on similarity to specific values for variables within the spectraData of the object rather than within a range. Thus, it would have the same advantages as the filterRanges function. It would prevent variable-specific functions and make the code easier to combine multiple filters.

I can however see the counter argument that this might not be as user-friendly as variable-specific functions. I would love to get your opinion @jorainer and @lgatto on this. If you are interested i can come up with a more detailed proposition of implementation.

jorainer commented 9 months ago

To me this function is great and also the filterSimilarities - that I would rather suggest to call filterValues would make a lot of sense to me. @lgatto what's your opinion on that?

We have already a use case in which we want to extract/filter spectra with their precursor m/z and retention time within respective ranges (from MS1 data).

philouail commented 9 months ago

Thanks @jorainer ! I have adapted the code to your suggestions.

jorainer commented 9 months ago

Thanks for the review @lgatto ! Regarding the name, if I understand you you would like to name the function to something like filterSpectraDataRanges? Or could you provide a possible example for the name?

I would not use spectraData in the name of the function - I would understand that I would filter/subset the spectraData (i.e. remove certain columns of spectraData) instead of filtering the Spectra.

To me filterRanges would make sense, because we want to filter the Spectra using values in one or more spectraVariables (hence the name of that parameter being spectraVariables). Also, for the other method we would use filterValues. And yes, the idea was in fact that the method should be similar to e.g. filterPrecursorMzRange or filterPrecursorMzValue, so the user would find/understand the method right away. In facto filterPrecursorMzRange would be the same as calling filterRanges(, spectraVariable = "precursorMz", ...).

We also thought of a generic filterSpectra method and a RangeFilter parameter - that would be another possibility, but then we would add a new S4 object (RangeFilter) and hence increase the complexity of Spectra...

lgatto commented 9 months ago

OK for filterRanges(), you convinced me. Please make sure in the documentation that filterPrecursorMzRange() and filterRanges(, spectraVariable = "precursorMz", ...) are equivalent.

philouail commented 9 months ago

Thanks @lgatto for the feedback ! I will adjust the documentation and the code accordingly. As well as adding the filterValues() function and a explanation in the vignette. I will try to make it as clear as possible how they can be used :) Putting it in draft for now while i prepare it !

philouail commented 8 months ago

So here is everything updated. I moved the main code to the MsBackend.R to have a similar implementation as the other filtering functions.

Some things i am not 100% sure about:

Also curiosity question but what is the different between filterPrecrusorMz, filterPrecursorMzValue, and filterPrecursorMzRange.

I will write the vignette now while i wait for feedback :) Thanks again for the help !

jorainer commented 8 months ago

Regarding your last question: filterPrecursorMz and filterPrecursorMzValues are the same (i.e. match/filter based on similarity of the precursor m/z). Both (should?) support to match against multiple target m/z values (e.g. filterPrecursorMzValues(sps, mz = c(123.3, 234.2)) to keep spectra with a precursor m/z that matches any of the provided values). The filterPrecursorMzRange allows to define a broader range (i.e. precursor m/z within a specified range).

jorainer commented 8 months ago

For the default of tolerance. For filterValues I would use 0 as default. also for other methods (filterPrecursorMzValues) we have 0. The mclosest (and closest) are a bit different than these filter functions: from a filter function I would expect to subset the data based on a (stringent) condition. From closest mclosest I would just like to get the most similar hits. The tolerance there is essentially a way to define that something is too different to be considered close. Hope this does make sense?

philouail commented 8 months ago

Thanks @jorainer for the idea, I'll update the code for the comments you made and implement the extra parameter !

lgatto commented 8 months ago

Thank you @philouail for the great contribution!