livnatje / DIALOGUE

DIALOGUE is a dimensionality reduction method that uses cross-cell-type associations to identify multicellular programs (MCPs) and map the cell transcriptome as a function of its environment.
Other
106 stars 16 forks source link

Issue with t() for sparse tpm matrices #44

Open SabrinaRichter opened 8 months ago

SabrinaRichter commented 8 months ago

Dear DIALOGUE team,

i want to run make.cell.type with tpm being a sparse matrix (dgRMatrix or dgCMatrix). When running with the loaded make.cell.type method I get the following error (when making tpm dense by as.matrix(tpm) all is fine):

Error in t.default(tpm): Argument ist keine Matrix
Traceback:

1. make.cell.type(name = cell_type, tpm = tpm, samples = samples, 
 .     X = pca, metadata = adata_ct$obs[c("Pool_ID")], cellQ = adata_ct$obs$QC_total_UMI, 
 .     )
2. cell.type(name = gsub("_", "", name), cells = colnames(tpm), 
 .     genes = rownames(tpm), cellQ = cellQ, tpm = tpm, tpmAv = tpmAv, 
 .     qcAv = aggregate(x = cellQ, by = list(samples), FUN = mean), 
 .     X = X, samples = samples, metadata = cbind.data.frame(cellQ = cellQ, 
 .         metadata), extra.scores = list())
3. new(structure("cell.type", package = "DIALOGUE"), ...)
4. initialize(value, ...)
5. initialize(value, ...)
6. t(average.mat.rows(t(tpm), samples))
7. average.mat.rows(t(tpm), samples)
8. laply(ids.u, function(x) {
 .     return(f(m[is.element(ids, x), ]))
 . })
9. llply(.data = .data, .fun = .fun, ..., .progress = .progress, 
 .     .inform = .inform, .parallel = .parallel, .paropts = .paropts)
10. structure(lapply(pieces, .fun, ...), dim = dim(pieces))
11. lapply(pieces, .fun, ...)
12. FUN(X[[i]], ...)
13. f(m[is.element(ids, x), ])
14. is.data.frame(x)
15. t(tpm)
16. t.default(tpm)

but if i copy over the code for average.mat.rows, get.abundant, cell.type and make.cell.type it runs fine!

Absolutely not an R expert, but could it be that something within DIALOGUE changes the transpose function to one that can't handle sparse matrices for some reason?

In case it helps, this is the code I am running:

cell_type <- "B"

adata_ct <- read_h5ad(file)

tpm <- t(adata_ct$X)  # dgRMatrix

pca <- adata_ct$obsm[['X_pca']]
rownames(pca) <- colnames(tpm)

samples <- adata_ct$obs$scRNASeq_sample_ID

make.cell.type(
    name = cell_type,
    tpm = tpm,
    samples = samples,
    X = pca,
    metadata = adata_ct$obs[c("Pool_ID")],
    cellQ = adata_ct$obs$QC_total_UMI,
)
SabrinaRichter commented 8 months ago

Ok I think I understood it now. t() without loading the Matrix package can only handle dense matrices and is apparently the one that is used here: https://github.com/livnatje/DIALOGUE/blob/9c146ccf28d7706aaa60d00947a9126b4e75fd69/R/DIALOGUE.cell.type.R#L39 while if i load DIALOGUE, it also loads Matrix and when i then define the make.cell.type method again, it get's the fancy t() version that can handle my sparse matrix. Any chance it could be arranged that this is incorporated here @livnatje ?

SabrinaRichter commented 8 months ago

i think allowing for sparse tpm would also require changing https://github.com/livnatje/DIALOGUE/blob/9c146ccf28d7706aaa60d00947a9126b4e75fd69/R/DIALOGUE.main.R#L233 to

cvals <- corSparse(t(r@tpm),r@scores)
rownames(cvals) <- rownames(r@tpm)
colnames(cvals) <- colnames(r@scores)
R$cca.gene.cor1[[x]]<-cvals

with corSparse from the qlcMatrix library (installed from github since not on CRAN anymore :/ but maybe there is another sparse cor that i didn't see)