EXP: Better Sweeps on LINCS (`lincs_rdkit_hparam`)

siboehm commented 2 years ago

I'm not happy with the results of the large sweep that we ran on LINCS. Mainly:

We should fix the latent space dimension to 64 (or 32?). This also restricts the number of parameters for the Vanilla model.
We should increase the contribution of the adversarial loss to the AE loss & make the model calculating the disentanglement loss nonlinear. There is no good reason that I can see for using a linear disentanglement loss, it just makes the scores look artificially good.

I'd do it like this:

Run a large sweep (~30 different HPs) for a single embedding (I'd lean towards RDKIT maybe, or Seq2Seq. We could also do GROVER).
Use the top performing hyper parameters to train a few models (with different seeds) for each of the other embeddings.
Use the results to compare the embeddings in terms of LINCS r2_score (this plot goes into the paper).
Use the best performing models for each embedding as the checkpoint for later finetuning on Trapnell.

I think it's important to get this right before we design too many other experiments. If it turns out that the transfer learning doesn't help for improving Trapnell scores, then there's no use in implementing #62 for example. We can then still runs the other experiments like #67 and hope to see improvements there. If that doesn't work either, we can check whether the model is at least useful for predicting the effects of drugs that it hasn't seen. That's just for covering our bases in the worst case, I think with some tweaking the Transfer learning will work.

siboehm commented 2 years ago

The reason I'd run 1) on just a single embedding is for computational reasons, and because else things get very complex. We could also run the sweep on 2 embeddings, but all 7 just seems like it's overkill.

MxMstrmn commented 2 years ago

I agree, 64 or 32 is reasonable. We can try both.
As already discussed, making the disentanglement evaluation non-linear is our choice.

Let us use RDKit, this our the baseline for all NN based embeddings. A good HP configuration mainly applies to the VAE part of the model. Hence, we to not restrict the other models by a large margin.
I would allow some flexibility wrt to the drug embedders as the emebddings have different dimensions. Other than that, I agree with the procedure.
Agreed.
Agreed.

If it turns out that the transfer learning doesn't help for improving Trapnell scores, then there's no use in implementing #62 for example.

This is not super straight forward, I think. If you train a trapnell model independent from lincs, just on its own, you would choose HVGs (these are the meaningful genes wrt to the dataset). Hence, #62 somewhat allows us to regulate the degree at which we lean towards lincs genes or trapnell genes. After all, good/ reasonable r2 scores are required and should be aquired also in the transfer learning case but, ultimately, we are interested in the experiments around #67.

Having said that, I would also prioritise #67 at the moment. This will help us more on further experiment design.

predicting the effects of drugs

I am not wuite sure if I understand the difference to #67 here. Are you thinking about a scalar value that encode the 'effect' as in L2 norm between condition and control?

siboehm commented 2 years ago

Sounds good wrt experiment design, lets put together a yaml file later.

In terms of the experiment hierarchy I was thinking this in terms of possible outcomes (best to worst):

Pretraining on bulkSeq improves scSeq scores across the board (every possible split)
Pretraining on bulkSeq improves scSeq scores for scSeq OOD splits (where the drugs was observed in bulkSeq, but not in scSeq)
The embeddings allow us to predict OOD drugs with reasonable accuracy

siboehm commented 2 years ago

We've decided to:

Run a large sweep using the RDKit embedding on LINCS, with a small latent space. Pick the optimal parameters for the autoencoder from this sweep
Fix the optimal autoencoder parameters, and schedule a small amount of runs for each embedding, to figure out reasonable drug embedding & doser parameters.

We'll use the outcomes of (2) for comparing the embeddings (including Vanilla!). Optionally we can schedule more runs using the optimal parameters for each embedding, but using different seeds to get an estimate of the variance.

siboehm commented 2 years ago

Closed by #79

theislab / chemCPA

EXP: Better Sweeps on LINCS (`lincs_rdkit_hparam`) #69