Open mschilli87 opened 3 months ago
It gets even weirder:
nrow(sobj)
12572
nrow(GetAssayData(sobj))
12572
subset(sobj, features=rownames(GetAssayData(sobj)))
An object of class Seurat
25144 features across 2700 samples within 2 assays
Active assay: SCT (12572 features, 3000 variable features)
3 layers present: counts, data, scale.data
1 other assay present: RNA
subset(sobj, features=head(rownames(GetAssayData(sobj))))
Error in .subscript.2ary(x, i, j, drop = TRUE) : subscript out of bounds
So the SCT
assay does include all features but still the subset
function cannot handle subsetting the Seurat object by features.
DefaultAssay(sobj) <- "RNA"
sobj$SCT <- NULL
subset(sobj, features=rownames(sobj))
An object of class Seurat
13714 features across 2700 samples within 1 assay
Active assay: RNA (13714 features, 0 variable features)
2 layers present: counts, data
subset(sobj, features=head(rownames(sobj)))
An object of class Seurat
6 features across 2700 samples within 1 assay
Active assay: RNA (6 features, 0 variable features)
2 layers present: counts, data
sobj$copy <- sobj$RNA
Warning message:
Key ‘rna_’ taken, using ‘copy_’ instead
subset(sobj, features=rownames(sobj))
An object of class Seurat
27428 features across 2700 samples within 2 assays
Active assay: RNA (13714 features, 0 variable features)
2 layers present: counts, data
1 other assay present: copy
subset(sobj, features=head(rownames(sobj)))
12 features across 2700 samples within 2 assays
Active assay: RNA (6 features, 0 variable features)
2 layers present: counts, data
1 other assay present: copy
So this is not just about the presence of any other second assay, but something specific about the SCT
one.
@satijalab, @mojaveazure: I just noticed I posted this on the seurat-data
repo. Would you mind transferring it over to seurat
, so I don't need to close it here and re-open it there? Thank you!
cc @Alexis-Varin, as you had some helpful insights for a similar issue.
What happens if, while default assay is SCT, you do
subset(sobj, features=head(rownames(sobj[["SCT"]]$data)))
#pretty sure SCTransform puts the normalized matrix in data but it could be in counts
and if
subset(sobj[["SCT"]], features=head(rownames(sobj)))
subset(sobj[["SCT"]], features=head(rownames(sobj[["SCT"]]$data)))
subset(sobj[["RNA"]], features=head(rownames(sobj)))
subset(sobj[["RNA"]], features=head(rownames(sobj[["RNA"]]$data)))
works
subset(sobj, features=head(rownames(sobj[["SCT"]]$data)))
Error in .subscript.2ary(x, i, j, drop = TRUE) : subscript out of bounds
subset(sobj, features=head(rownames(sobj[["SCT"]]$counts)))
Error in .subscript.2ary(x, i, j, drop = TRUE) : subscript out of bounds
subset(sobj[["SCT"]], features=head(rownames(sobj)))
SCTAssay data with 6 features for 2700 cells, and 1 SCTModel(s)
Top 1 variable feature:
NOC2L
subset(sobj[["SCT"]], features=head(rownames(sobj[["SCT"]]$data)))
SCTAssay data with 6 features for 2700 cells, and 1 SCTModel(s)
Top 1 variable feature:
NOC2L
subset(sobj[["RNA"]], features=head(rownames(sobj)))
Warning: Different features in new layer data than already exists for counts
Warning: Different features in new layer data than already exists for data
Assay (v5) data with 6 features for 2700 cells
First 6 features:
AL627309.1, RP11-206L10.2, LINC00115, NOC2L, KLHL17, PLEKHN1
Layers:
counts, data
subset(sobj[["RNA"]], features=head(rownames(sobj[["RNA"]]$data)))
Warning: Different features in new layer data than already exists for counts
Warning: Different features in new layer data than already exists for data
Assay (v5) data with 6 features for 2700 cells
First 6 features:
AL627309.1, AP006222.2, RP11-206L10.2, RP11-206L10.9, LINC00115, NOC2L
Layers:
counts, data
Is there anything I can do to track this down? Is SCT simply not supported/maintained/tested in Seurat anymore? If so, maybe the corresponding vignette should be updated/archived to reflect that new reality?
In case it helps: I tracked it down to this line, which passes when assay
is "RNA"
but fails when it is "SCT"
for the example data above.
update: I can also trigger it here with the above SCT
assay and layer
defaulting to 'counts'
.
update: This line triggers the issue with cells
being set to colnames(object)
and features
to Features(object, layer)
(layer
still defaulting to "counts"
).
update: I checked and it turns out that Features(object, 'counts')
returns all features in the Assay pre-filtering (just like Features(sobj[["SCT"]], 'counts')
), but rownames(object@counts)
misses most of them as it only contains the filtered data at this point.
Here is how far I got:
subset.Seurat(features = ...)
starts by subsetting the Assay data and then updates the n.counts
via CalcN
which uses the LayerData
of the (subset!) SCT
Assay for the 'counts' layer. While LayerData
does support limiting the query to a specific set of features, the corresponding features
parameter defauls to NULL
and is not changed by the call in CalcN
(neither explicitly, nor via passing on the ...
). Thus, LayerData
queries all features via the Features
function which returns the subset list of features when using the (default) 'data'
layer, but still returns the full list of (unfiltered!) features when setting layer = 'counts'
as done by LayerData
.
I don't know if this is the expected behaviour and the fix would be to pass the filtered feature
to the LayerData
call or if this is a bug in Feature
that ought be fixed instead.
@yuhanH, @mojaveazure: Any idea what is going wrong? AFAICT after a quick git blame
you worked on the corresponding code section last.
I can confirm that modifying CaclN
to pass the layer
parameter to override the 'counts'
default value 'fixes' this issue. However, I have no clue if it results in the correct results and/or if I this breaking anything else. I can provide a PR if this eases the review but I'd like to get some sort of feedback before putting in more work into this.
~So for some reason, SCT is dropping features that are still returned as
rownames
even thoughtSCT
is the active assay. On top,subset
fails with a cryptic error message instead of handling this sitatuion gracefully (e.g. by returning a Seurat object with all requested features that are present in the SCT alongside a warning why (and which) features had to be dropped and what to do to fix it). Moreover,~ I ran into this when trying to extract normalized counts for genes that I found differentially expressed in a downstream analysis, so those are not just some weird all-zero count edge cases or such.