R package for modeling single cell UMI expression data using regularized negative binomial regression
"missing value where TRUE/FALSE needed" error occurring with certain seeds (for nb_fast and poisson both with #71

diegoalexespi commented 3 years ago

Hi. I am receiving the following error on my counts data when attempting to use the vst function.

> z <- sctransform::vst(my_counts, verbosity = 2, method = "poisson", theta_estimation_fun = "")
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15887 by 2691
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2691 cells
  |==============================================================                                                             |  50%Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)

The error only appears when using either method = "nb_fast" or method = "poisson" with theta_estimation_fun = "". The error disappears when using method = "glmGamPoi" or method = "qpoisson". The error also disappears for method = "nb_fast" and method = "poisson" if I use theta_estimation_fun = "". Moreover, the error appearance for "nb_fast" and "poisson" depends on the seed:

> set.seed(90835)
> z <- sctransform::vst(my_counts, verbosity = 2, method = "poisson", theta_estimation_fun = "")
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15887 by 2691
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2691 cells
  |                                                                                                                           |   0%Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> set.seed(1234)
> z <- sctransform::vst(my_counts, verbosity = 2, method = "poisson", theta_estimation_fun = "")
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15887 by 2691
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2691 cells
  |==============================================================                                                             |  50%Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> set.seed(123)
> z <- sctransform::vst(my_counts, verbosity = 2, method = "poisson", theta_estimation_fun = "")
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15887 by 2691
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2691 cells
  |===========================================================================================================================| 100%
Found 83 outliers - those will be ignored in fitting/regularization step

Second step: Get residuals using fitted parameters for 15887 genes
  |===========================================================================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 21.56692 secs
There were 50 or more warnings (use warnings() to see the first 50)

It seems that the error also ends at similar places for nb_fast and poisson when the seed is the same for both:

> set.seed(90835)
> z <- sctransform::vst(my_counts, verbosity = 2, method = "nb_fast", theta_estimation_fun = "")
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15887 by 2691
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2691 cells
  |                                                                                                                           |   0%Error in while ((it <- it + 1) < limit && abs(del) > eps) { : 
  missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)
> set.seed(90835)
> z <- sctransform::vst(my_counts, verbosity = 2, method = "poisson", theta_estimation_fun = "")
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 15887 by 2691
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 2691 cells
  |                                                                                                                           |   0%Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed
In addition: There were 50 or more warnings (use warnings() to see the first 50)

Any help is appreciated! Happy to provide more information to help as best as I can. I tried to reproduce the error with pbmc_small and pbmc3k, but I have not been able to. Here is my sessionInfo():

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sctransform_0.3.1       pbmc3k.SeuratData_3.1.4 Nebulosa_0.99.94        patchwork_1.0.1         readxl_1.3.1           
 [6] readr_1.4.0             kableExtra_1.2.1        cowplot_1.1.0           Seurat_3.9.9.9002       ggplot2_3.3.2          
[11] tibble_3.0.4            tidyr_1.1.2             dplyr_1.0.2             knitr_1.30              magrittr_1.5           

ChristophH commented 3 years ago

Hi, Thank you for this detailed error report. Before going into the problem, could you try the current version of sctransform in the develop branch?


diegoalexespi commented 3 years ago

Installing from the current version in the develop branch appears to have worked, thank you so much! There are no more errors of this kind apparent. This seems related to issue #65 in case that helps others.

fuxins commented 3 years ago

Hi, I'm getting the same error. I’ve installed the current version of sctransform from the develop branch, but it seems not work for me Here's my commands and error messages

   project = projectid,
   assay = "RNA",
   names.field = 1,
   names.delim = "_", = NULL,
   min.cells = 5,
   min.features = 500
Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-')
>sob_sct <- SCTransform(sob,verbose=TRUE)
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 17032 by 428
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 428 cells
  |                                                                                                                                          |   0%Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In trigamma(th + y) : NaNs produced
2: In trigamma(th) : NaNs produced


Any advice would be appreciated!

ChristophH commented 3 years ago

The maximum number of features (i.e. genes) detected in the matrix you provided is 198. When I create the Seurat object with min.features = 500 it removes all cells from the downstream analysis and fails. When I set the limit to 5 all cells pass and sctransform works. The full matrix from your example looks bigger, but I don't know exactly what is causing the error. If you have updated sctransform and Seurat, make sure to restart your R session.

fuxins commented 3 years ago

Thanks a lot! I restart R session then it works.