theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
104 stars 24 forks source link

Deal with different genes in LINCS vs Trapnell during transfer learning #34

Closed siboehm closed 2 years ago

siboehm commented 2 years ago

In LINCS we have 979 genes, in Trapnell there are 5000. However the naive overlap (via plain text matching) is only 89. We have to define & implement a strategy for dealing with this during the transfer.

MxMstrmn commented 2 years ago

Indeed this overlap is quite small, we could do the following to increase the overlap:

I assume that this would make transfer easier. In any case we should check pca+umap to be sure that the signal is still present in the data when unsing different genes.

MxMstrmn commented 2 years ago

New versions include hvg genes and lincs genes, we can subset respectively in the upcoming experiments.

Gene PAPD7 is not available in the sciplex dataset, we could consider to exclude it for the lincs run. See preprocessing/lincs_sciplex_gene_matching.ipynb,

Code to check this:

adata_lincs.var[adata_lincs.var['in_sciplex']==False]
siboehm commented 2 years ago

Closed by #50