rdk / p2rank

P2Rank: Protein-ligand binding site prediction tool based on machine learning. Stand-alone command line program / Java library for predicting ligand binding pockets from protein structure.
https://rdk.github.io/p2rank/
MIT License
233 stars 31 forks source link
binding-sites bioinformatics drug-discovery groovy java ligand machine-learning mmcif molecular-structures p2rank pdb protein-ligand-docking protein-ligand-interactions protein-structure protein-surface proteins pymol random-forest structural-bioinformatics virtual-screening

P2Rank

Ligand-binding site prediction based on machine learning.

P2Rank illustration

version 2.4.2 Build Status License: MIT GitHub all releases

Description

P2Rank is a stand-alone command line program that predicts ligand-binding pockets from a protein structure. It achieves high prediction success rates without relying on an external software for computation of complex features or on a database of known protein-ligand templates.

What's new?

Requirements

P2Rank is tested on Linux, macOS, and Windows.

Setup

P2Rank requires no installation. Binary packages are available as GitHub Releases.

Usage

prank predict -f test_data/1fbl.pdb         # predict pockets on a single pdb file 

See more usage examples below...

Algorithm

P2Rank makes predictions by scoring and clustering points on the protein's solvent accessible surface. Ligandability score of individual points is determined by a machine learning based model trained on the dataset of known protein-ligand complexes. For more details see the slides and publications.

Presentation slides introducing the original version of the algorithm: Slides (pdf)

Publications

If you use P2Rank, please cite relevant papers:

Usage Examples

Following commands can be executed in the installation directory.

Print help

prank help

Predict ligand binding sites (P2Rank algorithm)

prank predict test.ds                    # run on dataset containing a list of pdb/cif files

prank predict -f test_data/1fbl.pdb      # run on a single pdb file
prank predict -f test_data/1fbl.cif      # run on a single mmCIF file
prank predict -f test_data/1fbl.bcif     # run on a single BinaryCIF file
prank predict -f test_data/1fbl.pdb.gz   # run on a single gzipped pdb file (other formats can be compressed too)
prank predict -f test_data/1fbl.cif.zst  # run on a single cif file compressed with Zstandard 

prank predict -threads 8     test.ds     # specify num. of working threads for parallel dataset processing
prank predict -o output_here test.ds     # explicitly specify output directory

prank predict -c alphafold   test.ds     # use alphafold config and model (config/alphafold.groovy)  
                                         # this profile is recommended for AlphaFold models, NMR and cryo-EM 
                                         # structures since it doesn't depend on b-factor as a feature         

Prediction output

For each structure file <struct_file> in the dataset P2Rank produces several output files:

Configuration

You can override the default parameters values in a custom config file:

prank predict -c config/example.groovy  test.ds
prank predict -c example                test.ds # same effect, config/ is default location and .groovy implicit extension

It is also possible to override parameters on the command line using their full name after - (not --).

prank predict                   -visualizations 0 -threads 8  test.ds   #  turn off visualizations and set the number of threads
prank predict -c example.groovy -visualizations 0 -threads 8  test.ds   #  overrides defaults as well as values from example.groovy

P2Rank has numerous configurable parameters. To see the list of standard parameters look into config/default.groovy and other example config files in this directory. To see the complete commented list of all (including undocumented) parameters see Params.groovy in the source code.

Evaluate prediction model

...on a file or a dataset with known ligands.

prank eval-predict -f test_data/1fbl.pdb
prank eval-predict test.ds

Rescoring (PRANK algorithm)

In addition to predicting new ligand binding sites, P2Rank is also able to rescore pockets predicted by other methods (Fpocket, ConCavity, SiteHound, MetaPocket2, LISE, DeepSite, and PUResNetV2.0 are supported at the moment).

prank rescore test_data/fpocket.ds
prank rescore fpocket.ds                 # test_data/ is default 'dataset_base_dir'
prank rescore fpocket.ds -o output_dir   # test_output/ is default 'output_base_dir'       
prank eval-rescore fpocket.ds            # evaluate rescoring model

Note: for rescoring the dataset file needs to have a specific 2-column format. See examples in test_data/fpocket.ds.

Build from sources

This project uses Gradle build system via included Gradle wrapper. On Windows use bash to execute build commands (bash is installed as a part of Git for Windows).

git clone https://github.com/rdk/p2rank.git && cd p2rank
./make.sh       

./unit-tests.sh    # optionally you can run tests to check everything works fine on your machine        
./tests.sh quick   # runs further tests

Now you can run the program via:

distro/prank       # standard mode that is run in production
./prank.sh         # development/training mode 

To use ./prank.sh (development/training mode) first you need to copy and edit misc/locval-env.sh into repo root directory (see training tutorial).

Comparison with Fpocket

Fpocket is a widely used open source ligand binding site prediction program. It is fast, easy to use and well documented. As such, it was a great inspiration for this project. Fpocket is written in C, and it is based on a different geometric algorithm.

Some practical differences:

Both Fpocket and P2Rank have many configurable parameters that influence behaviour of the algorithm and can be tweaked to achieve better results for particular requirements.

Thanks

This program builds upon software written by other people, either through library dependencies or through code included in its source tree (where no library builds were available). Notably:

Contributing

We welcome any bug reports, enhancement requests, and other contributions. To submit a bug report or enhancement request, please use the GitHub issues tracker. For more substantial contributions, please fork this repo, push your changes to your fork, and submit a pull request with a good commit message.