How to best evaluate results?

MxMstrmn commented 3 years ago

Current Stage

Grid searches run thorugh via seml
An example for a hparams search: ConfigFile
A finished run can be accessed from the database via seml.get_results('cpa_graphs_15', to_data_frame=True)
The train history is loaded into a pd.Dataframe
What should the steps be to evaluate performance of the model?
How to best visualise the
- Disentanglement metrics (pertubration, covariate)?
- Prediciton accuracy metrics like r2 (all genes, DE genes, top-x DE genes)?
- log2change i.e. (real - control) vs. (predicted - control)

Ideally we can standardise the evaluation → Figure design For this it would be great to get some scripts & examples!

M0hammadL commented 3 years ago

R2 is similar to the code we have in CPA, we have to add a function that measures log-fold-change difference:

TOOD: Add a function called LFC_dif which receive the adata and does:

ctrl_ct_x = average of ctrl cells in cell line x real_drug_a_ct_x = average of drug a cells in cell line x pred_drug_a_ct_x = average of predicted drug a cells in cell line x

pred = pred_drug_a_ct_x - ctrl_ct_x real = real_drug_a_ct_x - ctrl_ct_x LFC_R2= R2(pred,real) that you can do for all genes and real genes and then cmpute R2 between pred and real.

How to visualize:
compute R2 on means and also LFC and plot boxplot for train/test/OOD on all genes and DEGs. (See Oksana's fig and figure 3 in paper) -for individual drugs yo can do a scatter plot of x predicted LFC and y real LFC (see fig 3 in paper) you can use this vis and annotate top DEGs :
https://github.com/theislab/chemical_CPA/blob/2c885affebe2df7d55024b546bc416a85b7e097e/compert/plotting.py#L1104

MxMstrmn commented 2 years ago

Hi @M0hammadL,

plot boxplot for train/test/OOD

For these boxplots, we would have to return the whole list per category over which we sample, right?
Not sure if we should really store lists during training, maybe better to do after training?

theislab / chemCPA

How to best evaluate results? #3

Current Stage

What should the steps be to evaluate performance of the model?