Protein kinases play a crucial role in many cell signaling processes, making them one of the most important families of drug targets. Fragment-based drug design has proven useful as an approach to develop novel kinase inhibitors. However, fragment-based methods are usually limited to a knowledge-driven approach of optimizing a focused set of fragments. Here, we present a data-driven fragmentation and recombination approach instead. A novel computational fragmentation method was implemented, which splits known kinase inhibitors into fragments with respect to the subpockets that they occupy. Thereby, a fragment library with several pools, representing the subpockets, is created. This fragment library enables an in-depth analysis of the chemical space of known kinase inhibitors, and is used to recombine fragments in order to generate novel potential inhibitors.
For each input kinase-ligand complex, the kinase binding pocket is divided into six subpockets. The ligands are fragmented according to these subpockets, and a fragment library with several pools is created, where each pool corresponds to one subpocket and contains the fragments that were assigned to this subpocket.
Every possible fragment recombination is enumerated in order to create a virtual combinatorial compound library. The fragments are reconnected only at the broken bonds, while preserving the original subpocket connection of each bond.
Clone KinaseFocusedFragmentLibrary
:
git clone https://github.com/volkamerlab/KinaseFocusedFragmentLibrary.git
Create a conda environment containing all required packages:
conda env create -f devtools/conda-envs/environment.yml
# When using a MacBook with an M1 chip you may need:
CONDA_SUBDIR=osx-64 conda env create -f devtools/conda-envs/environment.yml
# Hint: if conda is too slow, consider using mamba instead:
mamba env create -f devtools/conda-envs/environment.yml
Hint: using a MacBook with an M1 chip you may need to install PyQt5 beforehand:
# with Homebrew
brew install pyqt5
# with pip3
pip3 install pyqt5
Activate the new environment:
conda activate kffl
Install kinase_focused_fragment_library
package:
cd ..
pip install -e KinaseFocusedFragmentLibrary
Kinase-ligand structures, and two CSV files containing metadata are downloaded from the KLIFS database using the following search options (for not shown options, the defaults are chosen):
You will need to do two downloads at the end of the page:
KLIFS_download
folder.KLIFS_export.csv
) via the "DOWNLOAD CSV" botton. Please place this file into the KLIFS_download
folder from step 1.The downloaded data should now have the following folder structure:
└── KLIFS_download
├── KLIFS_export.csv
├── overview.csv
└── HUMAN # species name
├── AAK1 # kinase name
│ ├── 4wsq_altA_chainA # PDB code, alternate model, chain
│ │ ├── ligand.mol2
│ │ ├── pocket.mol2
│ │ └── ...
│ └── ...
└── ...
The full fragmentation and recombination workflow consists of the following steps:
Hint: /path/to/KLIFS_download
means /path/to/folder/with/KLIFS_download/folder
.
kffl-preprocessing -f /path/to/KLIFS_download -o /put/path/to/fragment_library
/path/to/KLIFS/data/KLIFS_download/filtered_ligands.csv
contains metadata on all ligands
that were chosen for the fragmentation. /put/path/to/fragment_library/discarded_ligands/preprocessing.csv
.kffl-fragmentation -f /path/to/KLIFS_download -o /path/to/fragment_library
/path/to/fragment_library/fragmented_molecules/
/path/to/fragment_library/discarded_ligands/fragmentation.csv
./path/to/fragment_library
. For each subpocket, one folder containing one SD file exists:
└── fragment_library
├── AP.sdf
├── B1.sdf
├── B2.sdf
├── FP.sdf
├── GA.sdf
├── SE.sdf
└── X.sdf
├── discarded_ligands
│ ├── fragmentation.csv
│ └── preprocessing.csv
└── fragmented_molecules
├── 4wsq_altA_chainA.png
└── ...
In addition to the standard fields of the SDF format (3D coordinates of each atom and bonds between atoms), the files include the following associated data for each fragment:
Necessary step to proceed with recombination step (otherwise computational too expensive): To reduce the number of fragments in the fragment library using Butina Clustering, run the notebook
https://github.com/volkamerlab/KinFragLib/blob/master/notebooks/3_1_fragment_library_reduced.ipynb
This notebook generates a folder called fragment_library_reduced
└── fragment_library
└── ...
└── fragment_library_reduced
├── AP.sdf
├── B1.sdf
├── B2.sdf
├── FP.sdf
├── GA.sdf
├── SE.sdf
├── X.sdf
└── configuration.txt
The recombination step should be performed on a cluster:
kffl-recombination -f /path/to/fragment_library_reduced -o /path/to/combinatorial_library -s AP -d 4
/path/to/fragment_library
are used as input for the recombination, while the above folder
structure and file names are expected.-s
option specifies one or multiple subpockets from which the recombination procedure will start,
meaning that all resulting molecules will contain a fragment coming from this subpocket/these subpockets
(default: all subpockets). -d
option specifies the maximum number of fragments to combine (default: 6)./path/to/combinatorial_library/results/
,
which contain pickled objects representing the recombined molecules. For each molecule, this object contains the
fragment IDs and the bonds (as tuples of atom IDs) between the fragments. Download file chembl_33_chemreps.txt.gz
here: https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_33/ and unpack it.
Standardize ChEMBL data in this file using:
kffl-chembl
-f chembl_33_chemreps.txt
-o chembl_standardized_inchi.csv
The analysis step should be performed on a cluster:
kffl-ligand-analysis
-f /path/to/fragment_library_reduced
-klifs /path/to/KLIFS_download
-chembl chembl_standardized_inchi.csv
-o /path/to/combinatorial_library
Only in this step, the recombined molecules are constructed as actual Molecule objects.
These molecules are then compared to the molecules given in chembl_standardized_inchi.csv
(which should contain one standardized InChI string per line) and
to the original KLIFS ligands from which the fragments were built.
For each molecule, an object is stored in the file /path/to/combinatorial_library/cominatorial_library.pickle
which includes the representation of the molecule as created in the recombination step, its number of heavy atoms,
as well as binary values describing whether the molecule
Jupyter notebooks for analyzing the combinatorial library are stored at https://github.com/volkamerlab/KinFragLib/blob/master/notebooks/ (4_[1-5]*.ipynb
),
including 4_1_combinatorial_library_data.ipynb
introducing the basic steps for working with the combinatorial library data.