theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
88 stars 23 forks source link

EXP: Usefullness of LINCS pretraining for gene set expansion (`finetuning_num_genes`) #76

Closed siboehm closed 2 years ago

siboehm commented 2 years ago

Basic idea: We want to test how much pretraining on a smaller set of genes helps for increasing performance after finetuning on a larger amount of genes.

This is relevant biologically, since commonly a different set of genes is selected for single cell experiments.

Experiment steps:

  1. Pretrain on LINCS (900 genes), finetune on Trapnell (same 900 genes)
  2. Pretrain on LINCS (900 genes), finetune on Trapnell (2000 genes)
  3. Train from Scratch on Trapnell (900 genes)
  4. Train from Scratch on Trapnell (2000 genes)
  5. Compare performances between CCPA with pretraining and CCPA without pretraining for each of the two settings.

Implementation steps:

  1. [X] @MxMstrmn Generates new Trapnell datasets that have ~2000 genes
  2. [x] @siboehm Writes the transfer code (just adding another layer to decoder & encoder that adjusts the dimensions)
siboehm commented 2 years ago

As discussed, we can already perform (1) and (3) today, @MxMstrmn will write the YAMLs.

MxMstrmn commented 2 years ago

This was done and made it to the paper.