Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds

Zifeng-L commented 3 years ago

Hi, here. We used the ExpressionSet function to build the input datasets and run the MuSiC. However, there were something wrong. The single-cell datasets and bulk datasets were as follows.

bulk.eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 38187 features, 844 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: MMRF_2272_1_BM_CD138pos MMRF_2762_1_BM_CD138pos ...
    MMRF_2419_2_BM_CD138pos (844 total)
  varLabels: case_id
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

SC.eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 33090 features, 1815 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AGGTCCGAGATGCCAG-1_4 AGTGAGGTCTAGAGTC-1_8 ...
    GGCGACTCACGCCAGT-1_7 (1815 total)
  varLabels: orig.ident group ... sampleID (5 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

Est.prop.bulk = music_prop(bulk.eset = bulk.eset, sc.eset = SC.eset, clusters = 'seurat_clusters',
                               samples = 'sampleID', select.ct = c('0', '1', '2', '3','4', '5','6','7'),verbose = F)

Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds

Can anyone help me?

paraish commented 3 years ago

@Zifeng1995 Did you ever solve it? I'm getting the same error. Here is the Bulk ExpressionSet:

> bulk
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20314 features, 81 samples 
  element names: exprs 
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

Here is the single cell ExpressionSet:

ExpressionSet (storageMode: lockedEnvironment)
assayData: 41336 features, 23875 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: AAACCTGAGTGTTAGA_1 AAACCTGCAAGCGCTC_1 ... TTTGTCATCTGTTTGT_6 (23875
    total)
  varLabels: X Barcode ... Sample_ID (6 total)
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

And here is my call to Music, with the error returned:

>   music_results = music_prop(bulk.eset = bulk, 
+                              sc.eset = all_ref, 
+                              clusters = "Cluster",
+                              samples = "Sample",
+                              verbose = T)
Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds

paraish commented 3 years ago

I solved the issue - hopefully this will work for you as well. What happened was I had merged 6 single cell ExpressionSets (same experiment/batch). But the samples had different sets of features, such that when I merged them, NAs were introduced in some samples which didn't have features that were present in others. I pre-processed the data to only include features common to all 6 samples, and this enabled to code to successfully run.

Good luck!

stephen-siecinski commented 3 years ago

I too encountered this issue but did not have NAs in my df so the fix that Paraish outlined didn't resolve.

For context, I was subseting a large singe-cell dataframe from Allen Brain Atlas and using it to estimate neuronal cell proportions in my RNA-seq dataset.

The help page for music_prop notes that for markers,

"vector or list of gene names, default as NULL. If NULL, use all genes that provided by both bulk and single cell dataset."

So, after checking that my bulk-seq expressionset object and single-cell expressionset object shared ~27,000 gene IDs, I left that option NULL assuming that it would find the overlap.

However, the function only worked when I subseted my bulk-seq object to only include gene symbols that were present in the single-cell reference object. I'm not sure why this resolved the issue but if excluding NAs doesn't work for you, hopefully this will!

xuranw / MuSiC

Error in orig[[nm]][i, , ..., drop = drop] : subscript out of bounds #61