rformassspectrometry / QFeatures

Quantitative features for mass spectrometry data
https://RforMassSpectrometry.github.io/QFeatures/
25 stars 7 forks source link

feature request: remove empty assays #184

Closed lgatto closed 1 year ago

lgatto commented 1 year ago

I don't think we have a function that removes empty assays from a QFeatures object yet - something like this:

> x
An instance of class QFeatures containing 12 assays:
 [1] psmA5: SummarizedExperiment with 1 rows and 11 columns 
 [2] psmA9: SummarizedExperiment with 0 rows and 11 columns 
 [3] psmA11: SummarizedExperiment with 0 rows and 11 columns 
 ...
 [10] peptides: SummarizedExperiment with 1 rows and 33 columns 
 [11] norm_peptides: SummarizedExperiment with 1 rows and 33 columns 
 [12] proteins: SummarizedExperiment with 1 rows and 33 columns 
> nrows(x)
        psmA5         psmA9        psmA11     log_psmA5     log_psmA9 
            1             0             0             1             0 
   log_psmA11     peptideA5     peptideA9    peptideA11      peptides 
            0             1             0             0             1 
norm_peptides      proteins 
            1             1 
> nrows(x) > 0
        psmA5         psmA9        psmA11     log_psmA5     log_psmA9 
         TRUE         FALSE         FALSE          TRUE         FALSE 
   log_psmA11     peptideA5     peptideA9    peptideA11      peptides 
        FALSE          TRUE         FALSE         FALSE          TRUE 
norm_peptides      proteins 
         TRUE          TRUE 
> x[, , nrows(x) > 0]
harmonizing input:
  removing 66 sampleMap rows not in names(experiments)
An instance of class QFeatures containing 6 assays:
 [1] psmA5: SummarizedExperiment with 1 rows and 11 columns 
 [2] log_psmA5: SummarizedExperiment with 1 rows and 11 columns 
 [3] peptideA5: SummarizedExperiment with 1 rows and 11 columns 
 [4] peptides: SummarizedExperiment with 1 rows and 33 columns 
 [5] norm_peptides: SummarizedExperiment with 1 rows and 33 columns 
 [6] proteins: SummarizedExperiment with 1 rows and 33 columns 
Warning message:
'experiments' dropped; see 'metadata' 

I think this would be handy.

@cvanderaa, what do you think?

cvanderaa commented 1 year ago

Yes I agree with you, I also often use x[, , nrows(x) > 0] or x[, , ncols(x) > 0] .

Implementing a function (eg dropEmptyAssays()) is straightforward, but I'm wondering what is the benefit of keeping empty assays in the first place.

What do you think about modifying the subsetting method so that we automatically remove empty assays (maybe along with a warning or a message)? I remember this was previously the default behavior, but modifications in MultiAssayExperiment changes that behavior.

lgatto commented 1 year ago

Yes, we could change that behaviour, but in what context? [, filterFeatures(), ... others? all?

The changes above might take some time and come late for the new release (which it might be already), so a dropEmptyAssays() might be an easy and quick solution.

Also, could you confirm that this won't have any impact on the assayLinks. It should not, given that there aren't any features in these assays, but I prefer to double check.

lgatto commented 1 year ago

I'm going to start work with something like this for now:

dropEmptyAssays <- function(object, dims = 1:2) {
    stopifnot(inherits(object, "QFeatures"))
    if (!all(dims %in% 1:2))
        stop("Argument 'dims' must be in '1:2'.")
    if (1 %in% dims)
        object <- object[, , nrows(object) > 0]
    if (2 %in% dims)
        object <- object[, , ncols(object) > 0]
    object
}
cvanderaa commented 1 year ago

Ok thanks for the implementation! I think indeed this allows for a quick solution. Regarding your concerns about AssayLinks, there is no problem since [ , , ] should always return a QFeatures object with valid AssayLinks.

In my opinion (and honestly I need to look back in the code to check if that's not already the case), any subsetting or filtering (filterFeatures(), filterNA(), [, , ],...) should make use of a common underlying subset method. This would improve maintainability, provide consistent behavior (probably facilitating the use of drop = TRUE), and ensure that any subsetting keeps valid AssayLinks.

lgatto commented 1 year ago

I'll send a PR with that function.

cvanderaa commented 1 year ago

I think this can be closed.