jyaacoub / MutDTA

Improving the precision oncology pipeline by providing binding affinity purtubations predictions on a pirori identified cancer driver genes.
1 stars 2 forks source link

Implement and test GVP alternative #90

Closed jyaacoub closed 2 months ago

jyaacoub commented 6 months ago

Minimal code to get features for a sample protein (see commit 72c2855):

# %%
from prody import fetchPDB

fetchPDB('10gs', compressed=False)

# %%
from src.utils.residue import Chain
c = Chain('10gs.pdb', grep_atoms={'CA', 'N', 'C'})
# %%
import logging
logging.getLogger().setLevel(logging.DEBUG)

c.getCoords(get_all=True).shape # (N, 3)

# %%
from src.data_prep.feature_extraction.gvp import GVPFeatures

gvp_f = GVPFeatures()

# %%
f = gvp_f.featurize_as_graph('10gs', c.getCoords(get_all=True), c.sequence)
# %%
jyaacoub commented 5 months ago

Pasted image 20240412111444

Figure shows good performance with GVPL, GVPP not so much.

jyaacoub commented 5 months ago

Updated plot with new aflow results from https://github.com/jyaacoub/MutDTA/commit/f6cd150faeb5de578a838fe275b4def7a63e07ff

After more extensive hyperparameter tuning since it seemed weird that it had such high variance.

image

Code to generate figure + remove old results:

# %%
import logging
from typing import OrderedDict

import seaborn as sns
from matplotlib import pyplot as plt
from statannotations.Annotator import Annotator

from src.analysis.figures import prepare_df, custom_fig, fig_combined

df = prepare_df()
sel_dataset = 'PDBbind'
exclude = []
sel_col = 'cindex'

# removing first instance of aflow, only keeping second more tuned model
idx = df[df.run.str.contains('DGM_PDBbin.*?D_nomsaF_aflowE_128B_0.00012')].index
df.drop(idx, inplace=True)
# %%
models = {
    'DG': ('nomsa', 'binary', 'original', 'binary'),
    'aflow': ('nomsa', 'aflow', 'original', 'binary'),
    'aflow_ring3': ('nomsa', 'aflow_ring3', 'original', 'binary'),
    # 'gvpP': ('gvp', 'binary', 'original', 'binary'),
    # 'gvpL': ('nomsa', 'binary', 'gvp', 'binary'),
    'gvpL_aflow': ('nomsa', 'aflow', 'gvp', 'binary'),
    'gvpL_aflow_rng3': ('nomsa', 'aflow_ring3', 'gvp', 'binary'),
}

# %%
fig, axes = fig_combined(df, datasets=['PDBbind'], fig_callable=custom_fig,
             models=models, metrics=['pearson', 'cindex', 'mse', 'mae'],
             fig_scale=(8,5))
plt.xticks(rotation=45)

# %%
jyaacoub commented 4 months ago

Davis performance

image

jyaacoub commented 3 months ago

Results for all datasets:

image

jyaacoub commented 2 months ago

Adding ESM to davis GVPL_aflow model doesn't really help much

image