meichendong / SCDC

SCDC
42 stars 9 forks source link

Error in y[y < q15] <- q15[y < q15] #13

Closed MarcElosua closed 3 years ago

MarcElosua commented 4 years ago

Hi, First of all I would like to thank you for developing and maintaining this tool!

I am trying to deconvolute some mixtures of samples with SCDC and I'm coming across the error in the title while running SCDC_prop. I'm attaching the code I'm using below

# Create Expression set object
se_quartz[["barcode"]] <- colnames(se_quartz)
expr_sc <- Biobase::ExpressionSet(assayData = as.matrix(se_quartz@assays$RNA@counts),
                                  phenoData = AnnotatedDataFrame(data = data.frame(se_quartz@meta.data)))

expr_mix <- Biobase::ExpressionSet(assayData = as.matrix(synthetic_mixtures[[1]]))
# Deconvolute mixtures
scdc_deconv <- SCDC::SCDC_prop(bulk.eset = expr_mix,
                               sc.eset = expr_sc,
                               ct.varname = "nnet2",
                               sample = "barcode",
                               ct.sub = unique(expr_sc$nnet2))

Error in y[y < q15] <- q15[y < q15] : 
  NAs are not allowed in subscripted assignments
In addition: There were 50 or more warnings (use warnings() to see the first 50)

With the warnings being
Warning messages:
1: In FUN(newX[, i], ...) : no non-missing arguments to max; returning -Inf

If you need test data please let me know which is the best way of getting it to you!

Thanks a lot for your time

meichendong commented 4 years ago

Hi Marc, Sorry for late response and thanks for using SCDC and I appreciate your feedback! Yes I would like to help you debug if you could send some of your test data to me via email: meichen@live.unc.edu .

meichendong commented 3 years ago

Hi @MarcElosua , I've updated the corresponding functions, please let me know if you still encounter errors when you try.

MarcElosua commented 3 years ago

Hi @meichendong,

Appologies I didn't end up sending the test data. I tried to reinstall the pakage but am still getting the same error... I'm sending it over to you now!

Thanks a lot, Marc

eboileau commented 3 years ago

Hi,

Any follow-up on this issue?

I am experiencing the same problem. This happens in SCDC_basis at line 99 if the matrix var.adj has Inf or NaN values, when e.g. for a given sample and for all cell types some variables/genes have no counts, or zero variance, and in particular, if the resulting median is zero (line 78).

This is completely reproducible, but depends on the single cell data that is used.

I managed to run SCDC by redefining the basis matrix, essentially commenting line 78 in SCDC_basis:

my.max <- function(x,...){
    y <- apply(x,1,max, na.rm = TRUE)
   # y / median(y, na.rm = T) <- HERE
  }

but I am unsure as to the consequences this has on the final results.

I am running:

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 8 (jessie)

# SCDC package installed using
devtools::install_github("renozao/xbioc")
devtools::install_github("meichendong/SCDC")

Thanks for your feedback.

meichendong commented 3 years ago

Hi @eboileau , thanks for reporting the issue. Have you checked if there is only 'one subject/individual' in the single cell dataset? Please use SCDC_qc_ONE(), SCDC_prop_ONE() functions. If that's the case. Please let me know if this doesn't solve the problem! Thanks!

MarcElosua commented 3 years ago

The above solved my problem back then :)

eboileau commented 3 years ago

Thanks for your quick reply. No, I'm using 2 scRNA-seq data, each with 14 and 20 samples each. And in one dataset, for selected combinations of sample+cell type (think e.g. of markers that are highly expressed in some cell types, and not in others, with significant variation between individuals), some genes have zero counts, or a median of zero, which causes the issue at line 78 in SCDC_basis. So the scaling (by the median) seems to be problematic...

eboileau commented 3 years ago

Hi, any update on the maximal variance weight (MVW) calculation (scaling by the median)?

I quickly compared with and without scaling, globally across the different cell types using the bulk RNA-seq data from your paper (fadista77, with the seger and baron data), but couldn't identify major differences. However, this may be particular to these dataset (cross-cell variation across gene, cell types and samples).

Do you want me to send you some data to reproduce the issue?

meichendong commented 3 years ago

Hi @eboileau , example data would be perfect! Sorry I was planning to check this later over the weekend. Please feel free to send me the example data: meichen@live.unc.edu and I will try to figure this out over the weekend! Thanks for your patience!

meichendong commented 3 years ago

Update: The major reason the error occurred is that, there are subjects that do not provide cells from some cell types, and this becomes a problem when we try to do division or calculate variance. The functions have been updated.

peachone commented 3 years ago

Thank you very much for developing this tool SCDC, unfortunately my data is wrong with this step (below), in fact I do not understand what this step is trying to do, is he finding the maximum value of the variance matrix?

var.adj <- sapply(unique(sample.id), function(sid) {
my.max(sapply(unique(ct.id), function(id) {
y = countmat[, ct.id %in% id & sample.id %in% sid,
drop = FALSE]
apply(y, 1, var, na.rm = TRUE);
}), na.rm = TRUE)
})
meichendong commented 3 years ago

Hi @peachone , thanks for digging into the problem. This step is trying to calculate the subject-celltype specific expression variance, and extract the max value. According to your description, I guess it might be that for some subject/celltype, there might be less than 2 single cell samples that allow the function to calculate the variance. If so, an easier step would be to not calculate the MVW and set the SCDC_prop(..., weight.basis = F, ...) and see if the error still occur. Please feel free to contact me via email: meichen@live.unc.edu

peachone commented 3 years ago

I solved my problem by setting it up, thank you very much! weight.basis = F