perslab / CELLEX

CELLEX (CELL-type EXpression-specificity)
GNU General Public License v3.0
36 stars 9 forks source link

Load input data as sparse matrix #32

Closed DaianeH closed 2 years ago

DaianeH commented 2 years ago

I'm using the dataset available on https://singlecell.broadinstitute.org/single_cell/study/SCP1376/a-single-cell-atlas-of-human-and-mouse-white-adipose-tissue (from https://www.nature.com/articles/s41586-022-04518-2). I extracted the UMI counts using: GetAssayData(object = adipocytes, slot = "counts") which returned a large R S4 dgCMatrix. It has everything I need, genes as row names, cells as column names, UMI counts as values. I'm trying to convert it to data frame to then use it as input on CELLEX, but because it's too big, I'm unable to convert it to data frame or matrix. Is there a way to make CELLEX accept sparse matrix? Or is there a way that you're aware of to convert this sparse matrix to data frame, keeping columns and row names?

Thank you,

tstannius commented 2 years ago

Hi Daiane!

First of all, thanks for your patience.

Working with these large datasets is always a challenge. At the moment CELLEX only supports pandas DataFrame as input, which in most cases is only possible in a HPC environment due to the RAM requirements (hundreds of GB).

If you are working in an environment such as this, or can get access to one, I think we should be able to get it working :-)

In the demo_moca_100k tutorial (written by another contributor), 100k cells from a public dataset are first saved to a .loom file which is then read into a pandas dataframe. Could this approach perhaps work for you?

If you run into any specific error messages, please share them here.

tstannius commented 2 years ago

Closing as question has been answered. Please feel free to reopen if this is not the case.