openkinome / experiments-binding-affinity

6 stars 9 forks source link


KinoML notebooks

This repository contains detailed execution instructions for the notebooks used in our structure-informed machine learning experiments. We put special emphasis in reproducibility of results by carefully specifying every aspect of the running environment.

Some history hygiene details

Included studies

Note: Each directory contains one more you can check for additional details. The details of each run are encoded in a YAML file.


We split an experiment in four stages:

  1. Data intake. This involves taking the raw data (e.g. as provided in a publication, dataset or through a collaborator) and creating a DatasetProvider adapter. This stage should not happen in a notebook, but as part of the KinoML library.
  2. Featurization. One or more DatasetProvider objects are converted into a tensorial representations exported as NumPy arrays. This process should also export the measurement type metadata: observation model mathematical expression, dynamic range, loss adapters.
  3. Training. Takes a collection of tensors & measurement types, one model and a set of hyperparameters, and produces a collection of k-models (one per k-fold). Contextual metadata includes loss data, validation scores, data splits. Only training and validation sets are used here, but indices to test set are available for later stages if needed.
  4. Evaluation. Using the models from step 3, produce reports of performance of a test set. Test set does not have to be part of the same collection used in training; it can be a different collection entirely, as long as the featurized tensors are compatible (e.g. test ChEMBL data on PKIS2). Outputs include test scores.

Since featurized vectors can be reused across models, we do not implement a linear hierarchy that implies such a dependency. Instead, we use metadata to annotate each artifact and identify whether a certain stage is compatible with another, across experiments.

Getting started

A functional environment needs a proper KinoML installation. We recommend an installation via conda/mamba, e.g.:

mamba create -n kinoml --no-default-packages
mamba env update -n kinoml -f
conda activate kinoml
pip install

Clone this repository from GitHub:

git clone
cd experiments-binding-affinity

After installation you can directly run one of the provided examples.

# run the featurization
python features/featurize-template.ipynb features/ --overwrite

# run the ML experiment
python experiments/torch-train-test-debug-template.ipynb experiments/ --overwrite

Featurizations and ML experiments are performed via jupyter notebooks, which will result in an informative and automated representation of the ML experiment. Each run will produce a new notebook, which will be stored in the features or experiments directory. After running the above lines you should find the following two notebooks:

# featurization

# ML experiment