satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
210 stars 33 forks source link

UMAP error upon specifying min.features in SCTransform pipeline #5

Closed aemoor closed 5 years ago

aemoor commented 5 years ago

Hi Christoph,

Thanks for this nice pipeline. I noticed an UMAP error in your SCTransform pipeline that only appears if I specify the minimum feature number when creating the seurat object (any min.feature threshold above 1):

pbmc <- CreateSeuratObject(counts = join,min.features = 2)
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)

The RunUMAP produces this error:

 Error in py_call_impl(callable, dots$args, dots$keywords) : 
  TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Unknown attribute 'shape' of type none

File "../../anaconda2/envs/Renv/lib/python2.7/site-packages/umap/umap_.py", line 88:
def smooth_knn_dist(distances, k, n_iter=64, local_connectivity=1.0, bandwidth=1.0):
    <source elided>
    target = np.log2(k) * bandwidth
    rho = np.zeros(distances.shape[0])
    ^

[1] During: typing of get attribute at /Users/andreasmoor/anaconda2/envs/Renv/lib/python2.7/site-packages/umap/umap_.py (88)

File "../../anaconda2/envs/Renv/lib/python2.7/site-packages/umap/umap_.py", line 88:
def smooth_knn_dist(distances, k, n_iter=64, local_connectivity=1.0, bandwidth=1.0):
    <source elided>
    target = np.log2(k) * bandwidth
    rho = np.zeros(distances.shape[0])
    ^

This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.

To see Python/NumPy features supported by the latest release of Num 
11.
stop(structure(list(message = "TypingError: Failed in nopython mode pipeline (step: nopython frontend)\nUnknown attribute 'shape' of type none\n\nFile \"../../anaconda2/envs/Renv/lib/python2.7/site-packages/umap/umap_.py\", line 88:\ndef smooth_knn_dist(distances, k, n_iter=64, local_connectivity=1.0, bandwidth=1.0):\n    <source elided>\n    target = np.log2(k) * bandwidth\n    rho = np.zeros(distances.shape[0])\n    ^\n\n[1] During: typing of get attribute at /Users/andreasmoor/anaconda2/envs/Renv/lib/python2.7/site-packages/umap/umap_.py (88)\n\nFile \"../../anaconda2/envs/Renv/lib/python2.7/site-packages/umap/umap_.py\", line 88:\ndef smooth_knn_dist(distances, k, n_iter=64, local_connectivity=1.0, bandwidth=1.0):\n    <source elided>\n    target = np.log2(k) * bandwidth\n    rho = np.zeros(distances.shape[0])\n    ^\n\nThis is not usually a problem with Numba itself but instead often caused by\nthe use of unsupported features or an issue in resolving types.\n\nTo see Python/NumPy features supported by the latest release of Numba visit:\nhttp://numba.pydata.org/numba-doc/dev/reference/pysupported.html\nand\nhttp://numba.pydata.org/numba-doc/dev/reference/numpysupported.html\n\nFor more information about typing errors and how to debug them visit:\nhttp://numba.pydata.org/numba-doc/latest/user/troubleshoot.html#my-code-doesn-t-compile\n\nIf you think your code should work with Numba, please report the error message\nand traceback, along with a minimal reproducer at:\nhttps://github.com/numba/numba/issues/new\n", 
    call = py_call_impl(callable, dots$args, dots$keywords), 
    cppstack = structure(list(file = "", line = -1L, stack = c("1   reticulate.so                       0x000000017b133eab _ZN4Rcpp9exceptionC2EPKcb + 219", 
    "2   reticulate.so                       0x000000017b13a975 _ZN4Rcpp4stopERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE + 53",  ... 
10.
reraise at <string>#2
9.
error_rewrite at dispatcher.py#317
8.
_compile_for_args at dispatcher.py#350
7.
fit at umap_.py#1398
6.
fit_transform at umap_.py#1566
5.
umap$fit_transform(as.matrix(x = object)) 
4.
RunUMAP.default(object = data.use, assay = assay, n.neighbors = n.neighbors, 
    n.components = n.components, metric = metric, n.epochs = n.epochs, 
    learning.rate = learning.rate, min.dist = min.dist, spread = spread, 
    set.op.mix.ratio = set.op.mix.ratio, local.connectivity = local.connectivity,  ... 
3.
RunUMAP(object = data.use, assay = assay, n.neighbors = n.neighbors, 
    n.components = n.components, metric = metric, n.epochs = n.epochs, 
    learning.rate = learning.rate, min.dist = min.dist, spread = spread, 
    set.op.mix.ratio = set.op.mix.ratio, local.connectivity = local.connectivity,  ... 
2.
RunUMAP.Seurat(object = pbmc, dims = 1:20, verbose = FALSE) 
1.
RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE) 

If I run exactly the same dataset without setting min.features the RunUMAP command works without error:

pbmc <- CreateSeuratObject(counts = join)
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)

Any idea what that could indicate? Do you need my raw dataset? Thanks

Andreas

ChristophH commented 5 years ago

Hi Andreas, I have not encountered this issue before, and couldn't reproduce it with two separate datasets. Does RunTSNE(s, dims=1:20) work? If you can share your input data, I'll have a look.

aemoor commented 5 years ago

Thanks for your response.

pbmc <- RunTSNE(object = pbmc, dims = 1:20, verbose = FALSE)

works without a problem. I attached my seurat object, output of:

pbmc <- CreateSeuratObject(counts = join)

This code produces the error (reproduced on two different machines):

pbmc <- subset(x = pbmc, subset = nFeature_RNA > 200 )
pbmc <- SCTransform(object = pbmc, verbose = FALSE)
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)

If I omit the filtering step it works without error.

pbmc <- SCTransform(object = pbmc, verbose = FALSE)
pbmc <- RunPCA(object = pbmc, verbose = FALSE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = FALSE)

Thanks for looking into it. Andreas

ChristophH commented 5 years ago

I cannot reproduce the error. The following code works fine for me

load('~/Downloads/pbmc_seuratobject')
pbmc <- subset(x = pbmc, subset = nFeature_RNA > 200 )
pbmc <- SCTransform(object = pbmc, verbose = TRUE)
pbmc <- RunPCA(object = pbmc, verbose = TRUE)
pbmc <- RunUMAP(object = pbmc, dims = 1:20, verbose = TRUE)
DimPlot(pbmc)

Are you using the latest version of Seurat v3 and sctransform?

My session info

R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.10.0         Seurat_3.0.0.9000     sctransform_0.0.0.900 gridExtra_2.3        
[5] reshape2_1.4.3        ggplot2_3.0.0         Matrix_1.2-14        

loaded via a namespace (and not attached):
 [1] httr_1.3.1         tidyr_0.8.0        viridisLite_0.3.0  jsonlite_1.5       splines_3.5.0     
 [6] R.utils_2.6.0      gtools_3.8.1       assertthat_0.2.0   yaml_2.1.19        ggrepel_0.8.0     
[11] globals_0.12.2     pillar_1.2.2       lattice_0.20-35    reticulate_1.11.1  glue_1.3.0        
[16] digest_0.6.16      RColorBrewer_1.1-2 SDMTools_1.1-221   colorspace_1.3-2   cowplot_0.9.2     
[21] htmltools_0.3.6    R.oo_1.22.0        plyr_1.8.4         pkgconfig_2.0.2    tsne_0.1-3        
[26] listenv_0.7.0      purrr_0.2.4        scales_0.5.0       RANN_2.6           gdata_2.18.0      
[31] Rtsne_0.13         tibble_1.4.2       withr_2.1.2        ROCR_1.0-7         pbapply_1.3-4     
[36] lazyeval_0.2.1     survival_2.41-3    magrittr_1.5       R.methodsS3_1.7.1  nlme_3.1-137      
[41] MASS_7.3-49        gplots_3.0.1       ica_1.0-1          tools_3.5.0        fitdistrplus_1.0-9
[46] data.table_1.11.4  stringr_1.3.1      plotly_4.7.1       munsell_0.4.3      cluster_2.0.7-1   
[51] irlba_2.3.2        bindrcpp_0.2.2     compiler_3.5.0     rsvd_0.9           caTools_1.17.1    
[56] rlang_0.2.2        grid_3.5.0         ggridges_0.5.0     htmlwidgets_1.2    igraph_1.2.2      
[61] labeling_0.3       bitops_1.0-6       gtable_0.2.0       codetools_0.2-15   R6_2.2.2          
[66] zoo_1.8-1          dplyr_0.7.6        future.apply_1.0.1 bindr_0.1.1        KernSmooth_2.23-15
[71] metap_0.9          ape_5.1            stringi_1.2.4      Rcpp_0.12.18       png_0.1-7         
[76] tidyselect_0.2.4   lmtest_0.9-36     
aemoor commented 5 years ago

Thanks for testing it. Weird, could it be an R version issue? I saw that I use 3.5.2 on all the machines that produce the same UMAP error.

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.3

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sctransform_0.0.0.900 Seurat_3.0.0.9000    

loaded via a namespace (and not attached):
  [1] nlme_3.1-137        tsne_0.1-3          fs_1.2.6            bitops_1.0-6       
  [5] usethis_1.4.0       devtools_2.0.1      RColorBrewer_1.1-2  httr_1.4.0         
  [9] rprojroot_1.3-2     backports_1.1.3     tools_3.5.2         R6_2.4.0           
 [13] irlba_2.3.3         KernSmooth_2.23-15  lazyeval_0.2.2      colorspace_1.4-0   
 [17] withr_2.1.2         npsurv_0.4-0        prettyunits_1.0.2   processx_3.3.0     
 [21] tidyselect_0.2.5    gridExtra_2.3       compiler_3.5.2      cli_1.0.1          
 [25] desc_1.2.0          plotly_4.8.0        caTools_1.17.1.2    scales_1.0.0       
 [29] lmtest_0.9-36       ggridges_0.5.1      callr_3.2.0         pbapply_1.4-0      
 [33] stringr_1.4.0       digest_0.6.18       R.utils_2.8.0       pkgconfig_2.0.2    
 [37] htmltools_0.3.6     sessioninfo_1.1.1   bibtex_0.4.2        htmlwidgets_1.3    
 [41] rlang_0.3.1         rstudioapi_0.9.0    zoo_1.8-4           jsonlite_1.6       
 [45] ica_1.0-2           gtools_3.8.1        dplyr_0.8.0.1       R.oo_1.22.0        
 [49] magrittr_1.5        Matrix_1.2-15       Rcpp_1.0.1          munsell_0.5.0      
 [53] ape_5.3             reticulate_1.11.1   R.methodsS3_1.7.1   stringi_1.4.3      
 [57] gbRd_0.4-11         MASS_7.3-51.1       pkgbuild_1.0.2      gplots_3.0.1.1     
 [61] Rtsne_0.15          plyr_1.8.4          grid_3.5.2          parallel_3.5.2     
 [65] gdata_2.18.0        listenv_0.7.0       ggrepel_0.8.0       crayon_1.3.4       
 [69] lattice_0.20-38     cowplot_0.9.4       splines_3.5.2       SDMTools_1.1-221   
 [73] ps_1.3.0            pillar_1.3.1        igraph_1.2.4        pkgload_1.0.2      
 [77] future.apply_1.2.0  reshape2_1.4.3      codetools_0.2-15    glue_1.3.1         
 [81] packrat_0.5.0       lsei_1.2-0          metap_1.1           remotes_2.0.2      
 [85] data.table_1.12.0   png_0.1-7           Rdpack_0.10-1       gtable_0.2.0       
 [89] RANN_2.6.1          purrr_0.3.2         tidyr_0.8.3         future_1.12.0      
 [93] assertthat_0.2.0    ggplot2_3.1.0       rsvd_1.0.0          survival_2.43-3    
 [97] viridisLite_0.3.0   tibble_2.1.1        memoise_1.1.0       cluster_2.0.7-1    
[101] globals_0.12.4      fitdistrplus_1.0-14 ROCR_1.0-7   
ChristophH commented 5 years ago

I don't know. Could also be an issue on the python side of things, since Seurat uses reticulate to call UMAP.

lima1 commented 5 years ago

See: https://github.com/satijalab/seurat/issues/1240

The latest numba version is apparently not supported yet.