snap-stanford / UCE

UCE is a zero-shot foundation model for single-cell gene expression data
MIT License
120 stars 15 forks source link

Producing UCE embeddings for cell lines #34

Closed Ontos46 closed 2 months ago

Ontos46 commented 2 months ago

Hi! Apologies if that's a dumb question; I am trying to use UCE to produce embeddings for cell lines from Broad DepMap dataset but they don't have scRNA-seq data, only bulk sequencing. Can I still use the bulk RNA sequencing data to produce embeddings since all the cells in a cell line are clones so functionally they are the same? DepMap library only has TPM data for gene expressions so I wanted to make sure resulting embeddings would be valid.

Yanay1 commented 2 months ago

You could definitely try, it would just require putting it into h5ad format first. The TPM data might have cause some issues with how we sample genes however, so I'm not sure how good the embeddings would be (the main issue is the scale could be a lot higher than for single cell data).