theislab / chemCPA

Code for "Predicting Cellular Responses to Novel Drug Perturbations at a Single-Cell Resolution", NeurIPS 2022.
https://arxiv.org/abs/2204.13545
MIT License
88 stars 23 forks source link

EXP: Drug embeddings show stronger disentanglement (`chemical_emb_disentangle`) #66

Closed MxMstrmn closed 2 years ago

MxMstrmn commented 2 years ago

Looking at the results in chemical_CPA/simon/plot_sweep_results.ipynb

Hypothesis: The drug disentanglement might be easier for NN based embeddings as these are chemically motivated. Potential experiment: Make classifier stronger including more layer than just the linear one for the logistic regression.

siboehm commented 2 years ago

Potential experiment: Make classifier stringer including more layer than just the linear one for the logistic regression.

Stringer seems to be a typo? This is about adjusting the classifier during the evaluation, right? vs the discriminator that runs during training (which is already non-linear).

MxMstrmn commented 2 years ago

You are correct, I meant to include a non-linear classifier during the evaluation (going beyond the multi-class logistic regression)

I corrected the typo in the issue.

siboehm commented 2 years ago

The experiment to be run here:

  1. Train models on LINCS, once with a chemically meaningful embedding (eg RDKit), once with Vanilla.
  2. Compare the disentanglement scores

This doesn't need to be a separate cluster sweep, we'll get these scores through #69.

Expected outcome: The Vanilla CPA has higher disentanglement scores (= bad). We would be able to use this to argue that the chemical embeddings contribute meaningfully to making the adversarial autoencoder easier to train.

siboehm commented 2 years ago

Without having done any further analysis, the initial results from #69 don't really support this hypothesis.

MxMstrmn commented 2 years ago

Not super related, but what configuration determines that we have at least 125 epochs?

Screenshot 2022-01-25 at 16 34 15
MxMstrmn commented 2 years ago

Or ist this just a random artifact since models usually improve at first (~50 epochs) and then our settings with checkpoint_freq=25 + patience=3 leads to something like 125?

siboehm commented 2 years ago

I think it's because at the first evaluation we always improve (since it's the first), then we wait for 3 evaluations (patience=3) and terminate once the 5th evaluation doesn't yield improvements -> 5*25=125. So there is no way to finish earlier.

MxMstrmn commented 2 years ago

Will close this, did not make it into the final paper version.