welch-lab / liger

R package for integrating and analyzing multiple single-cell datasets
GNU General Public License v3.0
380 stars 78 forks source link

problem with read10X and createLiger #278

Closed sdwien closed 10 months ago

sdwien commented 1 year ago

Dear LIGER Team,

I am just trying to test LIGER for the first time, and for this purpose, I am trying to load two 10x single cell samples using the read10X and the createLiger functions. I was trying to follow this tutorial http://htmlpreview.github.io/?https://github.com/welch-lab/liger/blob/master/vignettes/Integrating_multi_scRNA_data.html , however, there, data are loaded from .rds files and it is not further elaborated how to load the data from the matrix_list object. I was trying to do it as follows :

matrix_list <- read10X(sample.dirs =c("WTsample/outs/raw_feature_bc_matrix", "KOsample/outs/raw_feature_bc_matrix"), sample.names = c("PGWT", "PGKO"), merge = F)

This step seems to work, says:

[1] "Processing sample PGWT"
[1] "Processing sample PGKO"

summary(matrix_list)

     Length Class  Mode
PGWT 1      -none- list
PGKO 1      -none- list

Then, I am trying to create the LIGER object:

liger <- createLiger(list(WT = matrix_list$PGWT, KO = matrix_list$PGKO))

Here, I am getting an error:

Error in .m2sparse.checking(from, ".", "C") : 
  matrix of invalid type "list" to .m2sparse.checking()

I am surely missing something obvious, but I cannot find out what, so thanks a lot in advance for your help,

best, Sophia

sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rliger_1.0.0    patchwork_1.1.2 Matrix_1.5-1    cowplot_1.1.1  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9         ica_1.0-3          RColorBrewer_1.1-3 plyr_1.8.7        
 [5] pillar_1.8.1       compiler_4.2.1     hdf5r_1.3.7        tools_4.2.1       
 [9] iterators_1.0.14   mclust_6.0.0       bit_4.0.4          Rtsne_0.16        
[13] lifecycle_1.0.3    tibble_3.1.8       gtable_0.3.1       lattice_0.20-45   
[17] pkgconfig_2.0.3    rlang_1.0.6        foreach_1.5.2      DBI_1.1.3         
[21] cli_3.4.1          ggrepel_0.9.1      parallel_4.2.1     dplyr_1.0.10      
[25] generics_0.1.3     vctrs_0.5.0        bit64_4.0.5        grid_4.2.1        
[29] tidyselect_1.2.0   glue_1.6.2         R6_2.5.1           fansi_1.0.3       
[33] irlba_2.3.5.1      ggplot2_3.3.6      magrittr_2.0.3     scales_1.2.1      
[37] codetools_0.2-18   riverplot_0.10     assertthat_0.2.1   colorspace_2.0-3  
[41] utf8_1.2.2         munsell_0.5.0      doParallel_1.0.17  FNN_1.1.3.1       
theAeon commented 1 year ago

It seems that the objects in matrix_list are themselves lists, not matrices. What format is your data in on the filesystem?

sdwien commented 1 year ago

Dear Andrew,

thanks a lot for looking into it.

I checked the matrix_list object; it is a list, as expected:

class(matrix_list)
[1] "list"

I also checked the two elements of the list, and as you said, these are also recognized as lists:

class(matrix_list$PGWT)
[1] "list"
class(matrix_list$PGKO)
[1] "list"

Is this not the expected behaviour of the read10X function? In the usage of the function, it says: "Value: List of merged matrices across data types (returns sparse matrix if only one data type detected), or nested list of matrices organized by sample if merge=F

Did I specify the input directories (sample/outs/raw_feature_bc_matrix) correctly? This is a plain Single Cell 3' v3 experiment, as far as I know. I assume that what is supposed to be loaded is the sample/outs/raw_feature_bc_matrix/matrix.mtx.gz , correct? This is also what I see inside the loaded elements:

matrix_list$PGWT
$`Gene Expression`
32285 x 1305055 sparse Matrix of class "dgCMatrix"
   [[ suppressing 32 column names ‘AAACCCAAGAAACCAT’, ‘AAACCCAAGAAACCCG’, ‘AAACCCAAGAAACTGT’ ... ]]
   [[ suppressing 32 column names ‘AAACCCAAGAAACCAT’, ‘AAACCCAAGAAACCCG’, ‘AAACCCAAGAAACTGT’ ... ]]

Xkr4          . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
Gm1992        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
Gm19938       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
Gm37381       . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......
[...]

So maybe I'd have to use this slot of the matrix_list for createLiger:

class(matrix_list$PGWT$`Gene Expression`)
[1] "dgCMatrix"
attr(,"package")
[1] "Matrix"

I ran the following:

PG_liger <- createLiger(list(PGWT = matrix_list$PGWT$Gene Expression, PGKO = matrix_list$PGKO$Gene Expression))

However, there is an error:

Error in createLiger(list(PGWT = matrix_list$PGWT$`Gene Expression`, PGKO = matrix_list$PGKO$`Gene Expression`)) : 
  At least one cell name is repeated across datasets; please make sure all cell names
         are unique.

I believe I have seen this error in another thread here on github, so I'll read up on it there. I would appreciate though if you could let me know if this would be the correct way to load the data or whether you have any further suggestions.

Many thanks, best, Sophia

theAeon commented 1 year ago

Yeah, the dgCmatrices look like exactly what you want. It appears to me that it is loading correctly and failing on validation, which is a step in the correct direction.