neurorestore / Libra

MIT License
150 stars 24 forks source link

Can Libra be used in other setting? #13

Open favilaco opened 2 years ago

favilaco commented 2 years ago

Hi, I was wondering if I could use Libra (specifically with setting: de_family = "pseudobulk", de_method = "limma", de_type = "voom") with a scRNA-seq dataset (Seurat) in which I have let's say:

I have been able to run the example with "hagai_toy" but, when using my data (which has the same metadata structure as the toy example), I keep on getting the following:

Error in intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE): invalid character indexing
Traceback:

1. run_de(scRNAseq_raw.Seurat, de_family = "pseudobulk", 
 .     de_method = "limma", de_type = "voom")
2. pseudobulk_de(input = input, meta = meta, replicate_col = replicate_col, 
 .     cell_type_col = cell_type_col, label_col = label_col, min_cells = min_cells, 
 .     min_reps = min_reps, min_features = min_features, de_family = "pseudobulk", 
 .     de_method = de_method, de_type = de_type)
3. to_pseudobulk(input = input, meta = meta, replicate_col = replicate_col, 
 .     cell_type_col = cell_type_col, label_col = label_col, min_cells = min_cells, 
 .     min_reps = min_reps, min_features = min_features, external = F)
4. keep %>% map(~{
 .     print(.)
 .     cell_type = .
 .     meta0 = meta %>% filter(cell_type == !!cell_type)
 .     expr0 = expr %>% magrittr::extract(, meta0$cell_barcode)
 .     if (n_distinct(meta0$label) < 2) 
 .         return(NA)
 .     replicate_counts = distinct(meta0, label, replicate) %>% 
 .         group_by(label) %>% summarise(replicates = n_distinct(replicate)) %>% 
 .         pull(replicates)
 .     if (any(replicate_counts < min_reps)) 
 .         return(NA)
 .     mm = model.matrix(~0 + replicate:label, data = meta0)
 .     mat_mm = expr0 %*% mm
 .     keep_genes = rowSums(mat_mm > 0) > min_features
 .     mat_mm = mat_mm[keep_genes, ] %>% as.data.frame()
 .     mat_mm %<>% as.data.frame()
 .     colnames(mat_mm) = gsub("replicate|label", "", colnames(mat_mm))
 .     keep_samples = colSums(mat_mm) > 0
 .     mat_mm %<>% magrittr::extract(, keep_samples)
 .     return(mat_mm)
 . }) %>% setNames(keep)
5. setNames(., keep)
6. map(., ~{
 .     print(.)
 .     cell_type = .
 .     meta0 = meta %>% filter(cell_type == !!cell_type)
 .     expr0 = expr %>% magrittr::extract(, meta0$cell_barcode)
 .     if (n_distinct(meta0$label) < 2) 
 .         return(NA)
 .     replicate_counts = distinct(meta0, label, replicate) %>% 
 .         group_by(label) %>% summarise(replicates = n_distinct(replicate)) %>% 
 .         pull(replicates)
 .     if (any(replicate_counts < min_reps)) 
 .         return(NA)
 .     mm = model.matrix(~0 + replicate:label, data = meta0)
 .     mat_mm = expr0 %*% mm
 .     keep_genes = rowSums(mat_mm > 0) > min_features
 .     mat_mm = mat_mm[keep_genes, ] %>% as.data.frame()
 .     mat_mm %<>% as.data.frame()
 .     colnames(mat_mm) = gsub("replicate|label", "", colnames(mat_mm))
 .     keep_samples = colSums(mat_mm) > 0
 .     mat_mm %<>% magrittr::extract(, keep_samples)
 .     return(mat_mm)
 . })
7. .f(.x[[i]], ...)
8. expr %>% magrittr::extract(, meta0$cell_barcode)
9. magrittr::extract(., , meta0$cell_barcode)
10. magrittr::extract(., , meta0$cell_barcode)
11. callGeneric(x, , j = j, drop = TRUE)
12. eval(call, parent.frame())
13. eval(call, parent.frame())
14. x[, j = j, drop = TRUE]
15. x[, j = j, drop = TRUE]
16. subCsp_cols(x, j, drop = drop)
17. intI(j, n = x@Dim[2], dn[[2]], give.dn = FALSE)
18. stop("invalid character indexing")

Thanks in advance!

Francisco

jordansquair commented 2 years ago

can you please send a sample of your data (e.g., just 100 genes or something)?

favilaco commented 2 years ago

Let me start sharing the structure of the input object (data is proprietary, would need to be modified before sharing). Is there something I'm unaware of? [it has exactly the same slots as "hagai_toy"]

Formal class 'Seurat' [package "SeuratObject"] with 13 slots
  ..@ assays      :List of 1
  .. ..$ RNA:Formal class 'Assay' [package "SeuratObject"] with 8 slots
  .. .. .. ..@ counts       :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. .. ..@ i       : int [1:229698040] 7 17 22 23 25 35 39 42 44 50 ...
  .. .. .. .. .. ..@ p       : int [1:85272] 0 7195 13353 19672 25784 32252 38494 44553 50614 56258 ...
  .. .. .. .. .. ..@ Dim     : int [1:2] 27387 85271
  .. .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. .. ..$ : chr [1:27387] "MIR1302-2HG" "AL627309.1" "AL627309.3" "AL627309.5" ...
  .. .. .. .. .. .. ..$ : chr [1:85271] "sample1_TATGTTGTCGCTAT-1" "sample1_GTGTTAGGTAACGA-1" "sample1_TCGGCCATGGCTAT-1" "sample1_TTCTTCCTCTCACC-1" ...
  .. .. .. .. .. ..@ x       : num [1:229698040] 4 2 5 14 1 5 3 2 2 8 ...
  .. .. .. .. .. ..@ factors : list()
  .. .. .. ..@ data         :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  .. .. .. .. .. ..@ i       : int [1:229698040] 17 27 32 43 55 65 79 82 94 95 ...
  .. .. .. .. .. ..@ p       : int [1:85272] 0 75 1333 1972 2584 3252 3844 4453 5014 5258 ...
  .. .. .. .. .. ..@ Dim     : int [1:2] 27387 85271
  .. .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. .. ..$ : chr [1:27387] "MIR1302-2HG" "AL627309.1" "AL627309.3" "AL627309.5" ...
  .. .. .. .. .. .. ..$ : chr [1:85271] "sample1_TATGTTGTCGCTAT-1" "sample1_GTGTTAGGTAACGA-1" "sample1_TCGGCCATGGCTAT-1" "sample1_TTCTTCCTCTCACC-1" ...
  .. .. .. .. .. ..@ x       : num [1:229698040] 4 3 9 19 2 6 7 9 2 8 ...
  .. .. .. .. .. ..@ factors : list()
  .. .. .. ..@ scale.data   : num[0 , 0 ] 
  .. .. .. ..@ key          : chr "rna_"
  .. .. .. ..@ assay.orig   : NULL
  .. .. .. ..@ var.features : chr(0) 
  .. .. .. ..@ meta.features:'data.frame':  27387 obs. of  0 variables
  .. .. .. ..@ misc         : list()
  ..@ meta.data   :'data.frame':    85271 obs. of  6 variables:
  .. ..$ cellID   : chr [1:85271] "sample1_TATGTTGTCGCTAT-1" "sample1_GTGTTAGGTAACGA-1" "sample1_TCGGCCATGGCTAT-1" "sample1_TTCTTCCTCTCACC-1" ...
  .. ..$ nCount_RNA  : num [1:85271] 13801 4167 2314 1322 27396 ...
  .. ..$ nFeature_RNA: int [1:85271] 3564 1920 560 352 4360 1876 2368 2203 5475 1890 ...
  .. ..$ cell_type: chr [1:85271] "CT1" "CT2" "CT3 "CT1" ...
  .. ..$ replicate: chr [1:85271] "sample1" "sample1" "sample1" "sample1" ...
  .. ..$ label    : chr [1:85271] "healthy" "healthy" "healthy" "healthy" ...
..@ active.assay: chr "RNA"
  ..@ active.ident: Factor w/ 1 level "SeuratProject": 1 1 1 1 1 1 1 1 1 1 ...
  .. ..- attr(*, "names")= chr [1:85271] "sample1_TATGTTGTCGCTAT-1" "sample1_GTGTTAGGTAACGA-1" "sample1_TCGGCCATGGCTAT-1" "sample1_TTCTTCCTCTCACC-1" ...
  ..@ graphs      : list()
  ..@ neighbors   : list()
  ..@ reductions  : list()
  ..@ images      : list()
  ..@ project.name: chr "TRIAL"
  ..@ misc        : list()
  ..@ version     :Classes 'package_version', 'numeric_version'  hidden list of 1
  .. ..$ : int [1:3] 4 0 4
  ..@ commands    : list()
  ..@ tools       : list()
favilaco commented 2 years ago

P.S. using de_family = 'mixedmodel', de_method = 'linear', de_type = 'LRT' seemed to :

Warning message in mclapply(X, FUN, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule, :
“all scheduled cores encountered errors in user code”
Warning message in p.adjust(p_val, method = "BH"):
“NAs introduced by coercion”
Warning message in p.adjust(p_val, method = "BH"):
“NAs introduced by coercion”

But the output looks strange (again, I think due to the lack of sample replicates?):

cell_type | gene | avg_logFC | p_val | p_val_adj | de_family | de_method | de_type
-------------------------------------------------------------------------------------------------------------
CT1 | A1BG | 1.605917e-01 | Error in data.frame(GENE = expr[x, ], label = [meta0𝑙𝑎𝑏𝑒𝑙,𝑟𝑒𝑝𝑙𝑖𝑐𝑎𝑡𝑒=𝑚𝑒𝑡𝑎0replicate) : arguments imply differing number of rows: 85271, 15369  | NA | pseudobulk | linear | LRT
jordansquair commented 2 years ago

It is difficult to say. If you are getting invalid character indexing are you sure you have no NAs anywhere, whether in the barcodes or in the meta data? If you want to just provide your seurat object with some random sample of 50-100 genes or something I can take a look.