theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
88 stars 23 forks source link

EXP `lincs_rdkit_hparam` #79

Closed siboehm closed 2 years ago

siboehm commented 2 years ago

Ref #69

Draft PR for now, just to show what the new folder structure would look like. Results are going into their own mongoDB collection, called lincs_rdkit_hparam.

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

MxMstrmn commented 2 years ago

I really like this setup! Also the additional README.md is nice :)

Experiments will be completed by tomorrow I assume, atm the status is

********** Report for database collection 'lincs_rdkit_hparam' **********
*     -   0 staged experiments
*     -   4 pending experiments
*     -  10 running experiments
*     -  11 completed experiments
*     -   0 interrupted experiments
*     -   0 failed experiments
*     -   0 killed experiments
*************************************************************************

I think, we should then add the corresponding results notebook and the PR is good to go?

MxMstrmn commented 2 years ago

Hi @siboehm, I analysed the sweep for lincs_rdkit_hparams, we have quite good models among them. I change the default plotting function to violin plots as show both the distribution and the individual runs.

Turns out that the best performing model performs best wrt to all our selection criteria: perturbation disentanglement, test_mean, and test_mean_de

I am correct, that the disentanglement score now is non-linear, right? Scores have worsened quite a bit overall.

Do you think that we should just take the hparams of this best perfomring model and start the 2nd part of the experiment?

siboehm commented 2 years ago

Yes the disentanglement score is now calculated using a non-linear model, this should make score worse (=higher) overall. Lets hope that the hparams have a large influence on the disentanglement score, instead of good scores being due to getting lucky during the adversarial training.

Yes we should start the 2nd part, I'll write a new yaml. The top configurations look quite dissimilar surprisingly (the 1. model is comparatively small, whereas the 3. model is pretty big). I'd just follow Niki's advice and pick the top performing hparams without thinking about it too much. It even has latent size 32!

siboehm commented 2 years ago

I picked the best hparams for the autoencoder and the adversarial. The parameters of the drug embedding and doser are being randomly sweeped using the same range as before. I'm also sweeping the step_size_lr again, as it applies to all optimizers (AE + Adv + drug embedder + doser).

@MxMstrmn Can you have a brief look? Mainly at the list of embeddings.

Fun facts:

MxMstrmn commented 2 years ago

Fun facts: I don't think the dropout hparam is actually used anywhere, I can't find any references to it in the code.

Yet another Code Gem 💎

MxMstrmn commented 2 years ago

@siboehm, I will edit the config and start the run tomorrow morning most likely.

siboehm commented 2 years ago

we'd obviously still need to make the final plots for the paper, but I think in terms of results we have all we need for this experiment.

siboehm commented 2 years ago

We'll still have to update the notebook with the final results, but there's some code changes in this PR that should make it into main soon, so I'm merging this.