Closed siboehm closed 2 years ago
Indeed this overlap is quite small, we could do the following to increase the overlap:
I assume that this would make transfer easier. In any case we should check pca+umap to be sure that the signal is still present in the data when unsing different genes.
New versions include hvg genes and lincs genes, we can subset respectively in the upcoming experiments.
Gene PAPD7
is not available in the sciplex dataset, we could consider to exclude it for the lincs run.
See preprocessing/lincs_sciplex_gene_matching.ipynb
,
Code to check this:
adata_lincs.var[adata_lincs.var['in_sciplex']==False]
Closed by #50
In LINCS we have 979 genes, in Trapnell there are 5000. However the naive overlap (via plain text matching) is only 89. We have to define & implement a strategy for dealing with this during the transfer.