Open jyaacoub opened 4 weeks ago
#### FULL TABLE COUNTS: ``` | Dataset | Protein | Compound | Total Binding Entities | |-----------|-----------|------------|-------------------------| | davis | 442 | 68 | 30056 | | kiba | 229 | 2111 | 118254 | | pdbbind | 3889 | 12639 | 19443 | ``` #### USED TABLE COUNTS: Due to memory limitations a couple records were excluded from our runs this is the full count that were actually used. ``` Dataset Protein Compound Total Binding Entities 0 davis 439 68 29852 1 kiba 226 2111 117590 2 pdbbind 3785 10950 16265 ```
#### non-overlayed or normalized plot ![image](https://github.com/user-attachments/assets/0ff6f518-1ab1-4bc2-9a96-f434c483e723) #### normalized and overlayed plot ![image](https://github.com/user-attachments/assets/518b7abf-4a63-4f84-9277-46e31df39da2)
![image](https://github.com/user-attachments/assets/c5ed6522-2c5e-47d5-9950-9bd5c47c7ed9)
``` Unique protein sequence counts: 860 Unique protein IDs: 361 Unique ligand counts: 197 Total records: 1962 ```
![image](https://github.com/user-attachments/assets/2af6d4bf-1bac-4238-8907-8d37136f6fd4)
![image](https://github.com/user-attachments/assets/e376d64d-7bb2-47a7-8ecc-41f0f368943e) ![image](https://github.com/user-attachments/assets/79c6796b-7711-4220-a8f7-dfcc70b202c0)
This plot shows the ability for the model to just predict the pkd given the protein sequence and ligand SMILES
Instead of looking at absolute predictive performance this plot show how well the model is able to predict the delta between a mutated and unmutated sequence.
Details
![image](https://github.com/user-attachments/assets/6809f25b-8d21-448e-bbdc-43aac3527b4a) ```python import logging from matplotlib import pyplot as plt from src.analysis.figures import prepare_df, fig_combined, custom_fig dft = prepare_df('./results/model_media/model_stats.csv') dfv = prepare_df('./results/model_media/model_stats_val.csv') models = { 'DG': ('nomsa', 'binary', 'original', 'binary'), 'esm': ('ESM', 'binary', 'original', 'binary'), # esm model 'aflow': ('nomsa', 'aflow', 'original', 'binary'), # 'gvpP': ('gvp', 'binary', 'original', 'binary'), 'gvpL': ('nomsa', 'binary', 'gvp', 'binary'), # 'aflow_ring3': ('nomsa', 'aflow_ring3', 'original', 'binary'), 'gvpL_aflow': ('nomsa', 'aflow', 'gvp', 'binary'), # 'gvpl_esm':('ESM', 'binary', 'gvp', 'binary'), # 'gvpL_aflow_rng3': ('nomsa', 'aflow_ring3', 'gvp', 'binary'), #GVPL_ESMM_davis3D_nomsaF_aflowE_48B_0.00010636872718329864LR_0.23282479481785903D_2000E_gvpLF_binaryLE # 'gvpl_esm_aflow': ('ESM', 'aflow', 'gvp', 'binary'), } fig, axes = fig_combined(dft, datasets=['davis', 'kiba', 'PDBbind'], fig_callable=custom_fig, models=models, metrics=['cindex', 'mse'], fig_scale=(10,5), add_stats=True, title_postfix=" test set performance", box=True) plt.xticks(rotation=45) # fig, axes = fig_combined(dfv, datasets=['davis'], fig_callable=custom_fig, # models=models, metrics=['cindex', 'mse'], # fig_scale=(10,5), add_stats=True, title_postfix=" validation set performance", box=True, fold_labels=True) # plt.xticks(rotation=45) ```
Final models - these are the ones we will show in the paper.