This repository contains detailed execution instructions for the notebooks used in our structure-informed machine learning experiments. We put special emphasis in reproducibility of results by carefully specifying every aspect of the running environment.
Some history hygiene details
_output/
relative to the running notebook. This includes the run notebook itself, with (updated) output cells.Note: Each directory contains one more README.md
you can check for additional details. The details of each run are encoded in a YAML file.
001-ligand-based
: Baseline models for the rest of the studies. Models are trained with ligand information, exclusively.002-kinase-informed
: Include some kinase information, without structural details.We split an experiment in four stages:
DatasetProvider
adapter. This stage should not happen in a notebook, but as part of the KinoML library.DatasetProvider
objects are converted into a tensorial representations exported as NumPy arrays. This process should also export the measurement type metadata: observation model mathematical expression, dynamic range, loss adapters.Since featurized vectors can be reused across models, we do not implement a linear hierarchy that implies such a dependency. Instead, we use metadata to annotate each artifact and identify whether a certain stage is compatible with another, across experiments.
A functional environment needs a proper KinoML installation. We recommend an installation via conda/mamba, e.g.:
mamba create -n kinoml --no-default-packages
mamba env update -n kinoml -f https://raw.githubusercontent.com/openkinome/kinoml/master/devtools/conda-envs/test_env.yaml
conda activate kinoml
pip install https://github.com/openkinome/kinoml/archive/master.tar.gz
Clone this repository from GitHub:
git clone git@github.com:openkinome/experiments-binding-affinity.git
cd experiments-binding-affinity
After installation you can directly run one of the provided examples.
# run the featurization
python run_notebook.py features/featurize-template.ipynb features/ligand-only-morgan1024-EGFR-subsample.py --overwrite
# run the ML experiment
python run_notebook.py experiments/torch-train-test-debug-template.ipynb experiments/001_example-ligand-only-morgan1024-EGFR-subset.py --overwrite
Featurizations and ML experiments are performed via jupyter notebooks, which will result in an informative and automated representation of the ML experiment. Each run will produce a new notebook, which will be stored in the features
or experiments
directory. After running the above lines you should find the following two notebooks:
# featurization
features/ligand-only-morgan1024-EGFR-subsample/featurize.ipynb
# ML experiment
experiments/001_example-ligand-only-morgan1024-EGFR-subset/torch-train-test-debug.ipynb