How to optmize CPA hyperparamters

theislab / cpa

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.

BSD 3-Clause "New" or "Revised" License

83 stars 17 forks source link

How to optmize CPA hyperparamters #37

Closed AxKo closed 6 months ago

AxKo commented 10 months ago

Unfortunately, the tutorial that is referenced on the github page, https://cpa-tools.readthedocs.io/en/latest/tutorials/optimizing_hyperparameters.html, does not exist. Given that CPA has really many parameters I wonder if a new tutorial about this topic is available??

M0hammadL commented 10 months ago

hi we are aware of that, we will update that soon, @Naghipourfar

are you looking for specific set of param or scenario ?

AxKo commented 10 months ago

Well, there are so many that I don't really know where to start.

So it would already be helpful to know which parameters are the most important ones for fitting. Or general advise, for instance can I test parameters on a smaller data set (for speed reasons) and then hope that such a parameter set will also perform well on a larger data set. Or would it be better to use the full size data set and only train for a few iterations to see the impact of the parameter change.

Also during training several metrics are displayed (disnt_basal, disnt_after, val_r2_mean, val_r2_var) and I'm not sure which ones are the important ones :-(

M0hammadL commented 10 months ago

Could you elaborate on the use case first? Then I can guide you better; see the papers for metrics (trVAE, scGen, CPA, chemCPA) but variance is capturing heterogeneity at the cell level (much harder than mean to capture) while mean says the model has arrested the overall effect of the perturbation)

distn_basal: shows how well the model has disentangled perturbation and covariate effects (lower is better, e.g if 6 perturbations then them it has to be 1/6 accuracy in the ideal case ) while disnt_after show that now that perturbation and covariate effects are recovered you should have high accuracy. Overall I would prioritize r2mean and also make sure disnt_basal is not that bad (like perturbation has to be not perfectly recoverable)

AxKo commented 10 months ago

Okay, so our use case is the following. We want to use CPA to transfer disease induced changes of gene expression from one tissue to another. For instance in the case of pancreatic cancer we might have transcriptomics data sets for 1) blood in healthy people 2) pancreas in healthy people 3) blood in people with pancreatic cancer

The idea is to interpret tissue type and disease state as 'perturbations' and then to transfer the 'cancer perturbation' from blood to pancreas. This would allow to predict gene expression in the pancreas with pancreatic cancer. Could CPA be used for that? Does it make sense to treat tissue type and disease as perturbations, or would it be better to treat them as covariates?

Any advice is welcomed

AxKo commented 9 months ago

So, any advice ?

ArianAmani commented 6 months ago

Hyperparameter tuner added #46 https://github.com/theislab/cpa/#how-to-optmize-cpa-hyperparamters-for-your-data