hemberg-lab / scmap

A tool for unsupervised projection of single cell RNA-seq data
http://bioconductor.org/packages/scmap
GNU General Public License v3.0
90 stars 11 forks source link

error on selectFeatures #31

Open kehuangke opened 3 years ago

kehuangke commented 3 years ago

Hello,

I use scmap to annotate cell types based on a reference annotation dataset. The reference annotation dataset was downloaded from celldex. However, I encounter an error when I choose HumanPrimaryCellAtlasData. The function of selectFeatures can run properly when I choose DatabaseImmuneCellExpressionData. Unlucky, I want to use HumanPrimaryCellAtlasData for future analysis.

There is the code that does not work properly(HumanPrimaryCellAtlasData):

ref<-celldex::HumanPrimaryCellAtlasData() colData(ref)$cell_type1 <- colData(ref)$label.fine rowData(ref)$feature_symbol <- rownames(ref) ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)), colData=colData(ref), rowData=rowData(ref)) ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)

The error is

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases In addition: Warning message: In linearModel(object, n_features) : Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...

The same code can be run rightly if I use DatabaseImmuneCellExpressionData

ref <- celldex::DatabaseImmuneCellExpressionData() colData(ref)$cell_type1 <- colData(ref)$label.fine rowData(ref)$feature_symbol <- rownames(ref) ref_sce <- SingleCellExperiment::SingleCellExperiment(assays=list(logcounts=Matrix::Matrix(assays(ref)$logcounts)), colData=colData(ref), rowData=rowData(ref)) ref_sce <- scmap::selectFeatures(ref_sce,suppress_plot=TRUE)

I have changed logcounts to counts and expanded value by power 10, but that did not work.

I speculate the error is due to the value of logcounts. This is the logcounts value on HumanPrimaryCellAtlasData which report an error

image

This is the logcounts value on DatabaseImmuneCellExpressionData which can run properly.

image

Could you please help me solve this problem?

Thanks

rersister commented 3 years ago

I also saw the same question and got the same error, but I don't know how to solve. Could you help us to solve this problem?

FionaMoon commented 2 years ago

I convert logcounts into counts, but this error still exists.

a <- assay(sce, "logcounts")
b <- expm1(a)
assays(sce)$counts <- b
pcantalupo commented 2 years ago

I'm getting the same error when trying to use celldex's MouseRNAseq reference with scmap. Is there any update on this issue?

Cristinex commented 2 years ago

Same error. Any update?

jordan841220 commented 2 years ago

@kehuangke @Cristinex @pcantalupo After looking into the code of selectFeature(), I think linearModel() function might be the reason of this. In linearModel(), the dropout rate of a feature was defined as how many cells express 0 log_count in such feature. However, this is not gonna work if there are no 0 log_count at all in the reference, leading to 0 dropout rates for all of the features.

linearModel <- function(object, n_features) {
    log_count <- as.matrix(logcounts(object))
    cols <- ncol(log_count)
    if (!"counts" %in% assayNames(object)) {
        warning("Your object does not contain counts() slot. Dropouts were calculated using logcounts() slot...")
        dropouts <- rowSums(log_count == 0)/cols * 100    
    } else {
        count <- as.matrix(counts(object))
        dropouts <- rowSums(count == 0)/cols * 100
    }
    # do not consider genes with 0 and 100 dropout rate
    dropouts_filter <- dropouts != 0 & dropouts != 100         
    dropouts_filter <- which(dropouts_filter)
    dropouts <- log2(dropouts[dropouts_filter])
    expression <- rowSums(log_count[dropouts_filter, ])/cols

    fit <- lm(dropouts ~ expression)

And if dropout rates are all 0%, we can not get fit <- lm(dropouts ~ expression) in linearModel() to work, resulting from the fact that we will filter out genes with 0 dropout rate. So, no features will be considered at the end.

In conclusion, copy the function from the source code and modified this line dropouts <- rowSums(log_count == 0)/cols * 100 to something like dropouts <- rowSums(log_count <= 3)/cols * 100 would work. (The value defining the dropout cutoff depends on your reference.)

or One can modified this line to get proper number of filtered features: dropouts_filter <- dropouts != 0 & dropouts != 100

Wish the authors can explain more on this issue.