theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
104 stars 24 forks source link

Initial gene embedding for chemcpa #158

Open sepidism opened 10 months ago

sepidism commented 10 months ago

Hi there, I wanted to double check something and would appreciate your help here. What is the initial embedding for each gene? I see you have the term "self.genes = torch.Tensor(data.X.A)" and then you pass that through the encoder and the rest of your arch. However, my issue is that, this way, your model sees the gene expressions for the treated cell lines and could influence your final results. Is that not a concern or im missing something here?

sepidism commented 10 months ago

Specifically, in evaluate_r2_sc, the input to compute_prediction() is y_true which means the input and the output are basically the same.

MxMstrmn commented 9 months ago

Hi @sepidism,

I am not sure if I understood your questions correctly but the encoder takes in the treated cells which are then embedded in a disentangled fashion (latent space arithmetics of basal state, perturbation state, cell state). After training, for evaluation, we compare to what extend the model is able to decode the ground truth cell signal by comparing it to the originally measured gene expression.

sepidism commented 9 months ago

Hi @MxMstrmn Sorry for the confusion. My question is, during the test/ evaluation, the input is again the treated cell? In your code, the input to the model.predict() is the treated cell line during the evaluation in evaluate_r2_sc.

MxMstrmn commented 8 months ago

Hi @sepidism,

The input to the model are simply the control genes of all cell lines present in the dataset, not treatment at all. The treatment is inferred only from the metadata and then from the resulting embeddings which are added in the latent space.