Hparam sweep for finetuning parameters (`sciplex_hparam`)

siboehm commented 2 years ago

As it stands there's no parameter finetuning in #85 or #84.

Parameters to be tuned (all models):
- autoencoder_lr, autoencoder_wd
- batch_size
- All adversary parameters. Since we create a new Adv anyway, we can even tweak the width and depth.

We should do this separately for the finetuned and from-scratch training.

Parameters to be tuned (each embedding):
- dosers_lr & dosers_wd
- Potentially autoencoder_lr (autencoder + drug embedder is updated using the same optimizer).

I think (1) is important, as a good lr may make a difference for finetuning and as the classification task for the adversaries is pretty different on Trapnell. (2) is probably much less important, we could use it as a source of variation during the individual runs.

[x] Write a yaml for (1). Split doesn't really matter. Hparam options can be copied from lincs_rdkit_hparam. Rdkit embedding.
[x] Start runs
[x] Add some more runs for smaller adversarial penalties (eg 0.001 - 1.0) @MxMstrmn
[ ] Copy best parameters (finetuned not finetuned separaetly) into #85, #84

MxMstrmn commented 2 years ago

I think we should change the name since we do not sweep over the embeddings anymore - these are fixed by the best performing model per drug embedder on LINCS.

So just sciplex_hparam? Trapnell actually is the name of the PI who led the experiment.

MxMstrmn commented 2 years ago

@siboehm, we might have to adjust the split on which we finetune and find the hyper parameters.

See 5870d75 in #87 for a first analysis of the finetuning sweep.
Pre-trained models tend to have a slight edge... but this does not have to mean much right now.

siboehm commented 2 years ago

Hm.

This is the split that has a bunch of drug left out, but we saw them in LINCS, correct?
These models have perfect disentanglement! Most have 0.023, which is the optimal. This may mean we have to go down with the adversarial penalty (go < 1), since the model may be focused too much on that lowering that penalty, instead of lowering the reconstruction. I'd probably try to test this out using one embedding only at first.
Overall the scores between all models are almost completely similar. That's suprising, given that there were clear differences between the embeddings on LINCS. I guess we excluded weave.
We could always subset Sciplex to less data. Maybe 100K examples instead of 600K? Or many fewer examples per drug? Basically find a setting in which pretraining works and then spin a story from there, while showing that for large datasets it's not necessary since it works well enough.

Didn't yet see any of the Vanilla runs, I wonder how they turn out. I guess if this is a hold-out split then Vanilla won't work anyway.

MxMstrmn commented 2 years ago

Didn't yet see any of the Vanilla runs, I wonder how they turn out. I guess if this is a hold-out split then Vanilla won't work anyway.

It is not a real hold out, but just some drugs combined with the highest dosage are put as 'ood'/'test'.

We did not check for these drugs to be contiained in LINCS, but it is quite likely that they are contained in LINCS.
That might be true, however, looking at the training we have really good reconstruction also.
They are slightly different for DE genes and disentanglement, again a slight edge for pre-training here. However, I was hoping for clearer differences.
I agree, the split is probably not ideal. training fewer examples actually can be also biologically motivated + more hold out on the drug site.

I think the vanilla runs are last in the list... something to wait for.

I attached the plot for the training metrics to show that for DE genes, pre-trained models seem to perform better. Again preliminary as not all runs are finished yet.

theislab / chemCPA

Hparam sweep for finetuning parameters (`sciplex_hparam`) #86