Closed jyaacoub closed 2 months ago
Figure shows good performance with GVPL, GVPP not so much.
After more extensive hyperparameter tuning since it seemed weird that it had such high variance.
# %%
import logging
from typing import OrderedDict
import seaborn as sns
from matplotlib import pyplot as plt
from statannotations.Annotator import Annotator
from src.analysis.figures import prepare_df, custom_fig, fig_combined
df = prepare_df()
sel_dataset = 'PDBbind'
exclude = []
sel_col = 'cindex'
# removing first instance of aflow, only keeping second more tuned model
idx = df[df.run.str.contains('DGM_PDBbin.*?D_nomsaF_aflowE_128B_0.00012')].index
df.drop(idx, inplace=True)
# %%
models = {
'DG': ('nomsa', 'binary', 'original', 'binary'),
'aflow': ('nomsa', 'aflow', 'original', 'binary'),
'aflow_ring3': ('nomsa', 'aflow_ring3', 'original', 'binary'),
# 'gvpP': ('gvp', 'binary', 'original', 'binary'),
# 'gvpL': ('nomsa', 'binary', 'gvp', 'binary'),
'gvpL_aflow': ('nomsa', 'aflow', 'gvp', 'binary'),
'gvpL_aflow_rng3': ('nomsa', 'aflow_ring3', 'gvp', 'binary'),
}
# %%
fig, axes = fig_combined(df, datasets=['PDBbind'], fig_callable=custom_fig,
models=models, metrics=['pearson', 'cindex', 'mse', 'mae'],
fig_scale=(8,5))
plt.xticks(rotation=45)
# %%
GVPL_aflow
on Davis is due to the low ligand count in Davis, this makes it harder for it to generalize well to unseen ligands.
Minimal code to get features for a sample protein (see commit 72c2855):