rformassspectrometry / QFeatures

Quantitative features for mass spectrometry data
https://RforMassSpectrometry.github.io/QFeatures/
25 stars 7 forks source link

Refactor: QFeatures subsetting #186

Open cvanderaa opened 1 year ago

cvanderaa commented 1 year ago

We could refactor subsetting and filtering of Qfeatures object. Here are some points I think would be worth improving for maintainability of the code:

  1. In my opinion, filtering and subsetting are related concepts that should be considered together. Therefore I would suggest to combine all code related to one or the other in a single file. This would facilitate the centralization of the subsetting implementation. More specifically, I would suggest to combine the following code into QFeatures-subset.R:
    • [ (in QFeatures-class.R)
    • filterNA() (in QFeatures-missing-data.R)
    • subsetByFeature() (in subsetBy-methods.R)
    • filterFeatures() (in QFeatures-filter.R)
  2. The functions/methods listed above make use of different subsetting backend instead of a centralized backend that provides consistency and ensures validity. More specifically:
    • [ (in QFeatures-class.R): this should be the centralized backend, ie x[i, j, k]
    • filterNA(): currently uses x[i, j], but should be x[i, j, k]
    • subsetByFeature(): it reconstructs a QFeatures using its constructor.
    • filterFeatures(): uses x[i, j, k], so OK.
  3. filterFeaturesWithAnnotationFilter() and filterFeaturesWithFormula() have a lot of duplicated code what requires parallel maintenance. I'm sure we could make use of a common internal function to handle this.
  4. The implementation of .subsetByFeature() is too long and difficult to understand (my bad sorry).

bonus: subsetting can be very slow for large datasets (cf SCP). Proper refactoring may help identify and solve resource bottlenecks.

lgatto commented 1 year ago

Some comments/suggestions: