The protein is represented with a multiple sequence alignment and the ligand as a SMILES string, allowing for unconstrained flexibility in the protein-ligand interface. There are two versions of Umol: one that uses protein pocket information (recommended) and one that does not. Please see the runscript (predict.sh) for more information.
Umol is available under the Apache License, Version 2.0. \ The Umol parameters are made available under the terms of the CC BY 4.0 license.
The entire installation takes <1 hour on a standard computer. \ We assume you have CUDA12. For CUDA11, you will have to change the installation of some packages. \ The runtime will depend on the GPU you have available and the size of the protein-ligand complex you are predicting. \ On an NVIDIA A100 GPU, the prediction time is a few minutes on average.
First install miniconda, see: https://docs.conda.io/projects/miniconda/en/latest/miniconda-install.html or https://docs.conda.io/projects/miniconda/en/latest/miniconda-other-installer-links.html
bash install_dependencies.sh
conda activate umol
bash predict.sh
PDB_FILE=./data/test_case/7NB4/7NB4.pdb1
PROTEIN_CHAIN='A'
LIGAND_NAME='U6Q'
OUTDIR=./data/test_case/7NB4/
python3 ./src/parse_pocket.py --pdb_file $PDB_FILE \
--protein_chain $PROTEIN_CHAIN \
--ligand_name $LIGAND_NAME \
--outdir $OUTDIR
Bryant, P., Kelkar, A., Guljas, A. Clementi, C. and Noé F. Structure prediction of protein-ligand complexes from sequence information with Umol. Nat Commun 15, 4536 (2024). https://doi.org/10.1038/s41467-024-48837-6