volkamerlab / KinaseFocusedFragmentLibrary

Subpocket-based fragmentation and recombination of kinase inhibitors
MIT License
6 stars 0 forks source link

KinaseFocusedFragmentLibrary

Exploring the chemical space of kinase inhibitors: Subpocket-based fragmentation for data-driven recombination

Introduction

Protein kinases play a crucial role in many cell signaling processes, making them one of the most important families of drug targets. Fragment-based drug design has proven useful as an approach to develop novel kinase inhibitors. However, fragment-based methods are usually limited to a knowledge-driven approach of optimizing a focused set of fragments. Here, we present a data-driven fragmentation and recombination approach instead. A novel computational fragmentation method was implemented, which splits known kinase inhibitors into fragments with respect to the subpockets that they occupy. Thereby, a fragment library with several pools, representing the subpockets, is created. This fragment library enables an in-depth analysis of the chemical space of known kinase inhibitors, and is used to recombine fragments in order to generate novel potential inhibitors.

Fragmentation

For each input kinase-ligand complex, the kinase binding pocket is divided into six subpockets. The ligands are fragmented according to these subpockets, and a fragment library with several pools is created, where each pool corresponds to one subpocket and contains the fragments that were assigned to this subpocket.


Recombination

Every possible fragment recombination is enumerated in order to create a virtual combinatorial compound library. The fragments are reconnected only at the broken bonds, while preserving the original subpocket connection of each bond.

Usage

Clone KinaseFocusedFragmentLibrary:

git clone https://github.com/volkamerlab/KinaseFocusedFragmentLibrary.git

Dependencies

Create a conda environment containing all required packages:

conda env create -f devtools/conda-envs/environment.yml
# When using a MacBook with an M1 chip you may need:
CONDA_SUBDIR=osx-64 conda env create -f devtools/conda-envs/environment.yml

# Hint: if conda is too slow, consider using mamba instead:
mamba env create -f devtools/conda-envs/environment.yml

Hint: using a MacBook with an M1 chip you may need to install PyQt5 beforehand:

# with Homebrew
brew install pyqt5
# with pip3
pip3 install pyqt5

Activate the new environment:

conda activate kffl

Install kinase_focused_fragment_library package:

cd ..
pip install -e KinaseFocusedFragmentLibrary

Input

Kinase-ligand structures, and two CSV files containing metadata are downloaded from the KLIFS database using the following search options (for not shown options, the defaults are chosen):




You will need to do two downloads at the end of the page:

  1. Download the structural data via the "DOWNLOAD STRUCTURES" botton (choose the mol2 zip file option). If you are downloading more than 50 structures, you will get an email with a download link. Unpack the downloaded folder: You now have a KLIFS_download folder.
  2. Download the metadata file (KLIFS_export.csv) via the "DOWNLOAD CSV" botton. Please place this file into the KLIFS_download folder from step 1.

The downloaded data should now have the following folder structure:

└── KLIFS_download
    ├── KLIFS_export.csv
    ├── overview.csv
    └── HUMAN                      # species name  
        ├── AAK1                   # kinase name
        │   ├── 4wsq_altA_chainA   # PDB code, alternate model, chain
        │   │   ├── ligand.mol2
        │   │   ├── pocket.mol2
        │   │   └── ...
        │   └── ...
        └── ...

Code

The full fragmentation and recombination workflow consists of the following steps:

  1. Preprocessing
  2. Fragmentation
  3. Optional but highly advised: Fragment library reduction
  4. Recombination
  5. Recombined molecule analysis

Hint: /path/to/KLIFS_download means /path/to/folder/with/KLIFS_download/folder.

1. Preprocessing
kffl-preprocessing -f /path/to/KLIFS_download -o /put/path/to/fragment_library
2. Fragmentation
kffl-fragmentation -f /path/to/KLIFS_download -o /path/to/fragment_library

For each subpocket, one folder containing one SD file exists:

└── fragment_library
    ├── AP.sdf
    ├── B1.sdf
    ├── B2.sdf
    ├── FP.sdf
    ├── GA.sdf
    ├── SE.sdf
    └── X.sdf
    ├── discarded_ligands
    │   ├── fragmentation.csv
    │   └── preprocessing.csv        
    └── fragmented_molecules
        ├── 4wsq_altA_chainA.png
        └── ...

In addition to the standard fields of the SDF format (3D coordinates of each atom and bonds between atoms), the files include the following associated data for each fragment:

3. Optional but highly advised: Fragment library reduction

Necessary step to proceed with recombination step (otherwise computational too expensive): To reduce the number of fragments in the fragment library using Butina Clustering, run the notebook

https://github.com/volkamerlab/KinFragLib/blob/master/notebooks/3_1_fragment_library_reduced.ipynb

This notebook generates a folder called fragment_library_reduced

└── fragment_library
    └── ...
└── fragment_library_reduced
    ├── AP.sdf
    ├── B1.sdf
    ├── B2.sdf
    ├── FP.sdf
    ├── GA.sdf
    ├── SE.sdf
    ├── X.sdf    
    └── configuration.txt
4. Recombination

The recombination step should be performed on a cluster:

kffl-recombination -f /path/to/fragment_library_reduced -o /path/to/combinatorial_library -s AP -d 4
5. Recombined molecule analysis

Download file chembl_33_chemreps.txt.gz here: https://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_33/ and unpack it.

Standardize ChEMBL data in this file using:

kffl-chembl 
-f chembl_33_chemreps.txt 
-o chembl_standardized_inchi.csv

The analysis step should be performed on a cluster:

kffl-ligand-analysis 
-f /path/to/fragment_library_reduced 
-klifs /path/to/KLIFS_download 
-chembl chembl_standardized_inchi.csv 
-o /path/to/combinatorial_library