i2bc / SURFMAP

Other
22 stars 3 forks source link
SURFMAP Release MIT License

SURFMAP

Table of contents

Aims

Go to the top

SURFMAP is a free standalone and easy-to-use command-line interface (CLI) software that enables the fast and automated 2D projection of either predefined features of protein surface (electrostatic potential, Kyte-Doolittle hydrophobicity, Wimley-White hydrophobicity, stickiness and surface relief) or any descriptor encoded in the temperature factor column of a PDB file. The 2D maps computed by SURFMAP can be used to analyze and/or compare protein surface properties.

Installation

Go to the top

Requirements

SURFMAP is a CLI tool that requires a UNIX-based OS system. It is written in python (version 3.7), R (version 3.6). It relies on the already included MSMS software (1) and may optionally require APBS (2) if the user wants to perform electrostatics calculations.

All those requirements (including APBS) are met in a predefined Docker image that we recommend the user to use.

For a usage of the docker image - an UNIX-based OS system (any linux distribution, a MacOS system or [WSL2](https://learn.microsoft.com/fr-fr/windows/wsl/install) on windows) - [Python >= 3.7](https://www.python.org/downloads) - [Docker](https://docs.docker.com/get-docker/)
For a usage on your local OS - an UNIX-based OS system (any linux distribution, a MacOS system or [WSL2](https://learn.microsoft.com/fr-fr/windows/wsl/install) on windows) - [Python >= 3.7](https://www.python.org/downloads) - [R >= 3.6](https://cran.r-project.org/) - [APBS](https://github.com/Electrostatics/apbs/releases) (optional - only if you want to compute electrostatic potential)


:bell: Please note that whether you want to use the Docker image of SURFMAP or not, you will still need to install the SURFMAP package. Indeed the package contains internal features that make the use of the Docker image totally transparent for the user who will not have to enter 'complex' commands for the connection of useful mounting points. In fact, the SURFMAP commands are almost exactly the same between the use of the docker image or not (see here).

Recommendation

We strongly recommend that you install the SURFMAP package and its python dependencies in an isolated environment. Click in the section below for a short illustration on why and how to use an isolated environment.

How to use an isolated environment (recommended)

By using an isolated environment you will avoid potential version conflicts between python libraries when working on different projects. Some of the most popular tools to work with isolated python environments are [virtualenv](https://pypi.org/project/virtualenv/), [pyenv](https://pypi.org/project/pyenv/), [pipenv](https://pypi.org/project/pipenv/).

Below is an example on how to use [virtualenv](https://pypi.org/project/virtualenv/). #### 1. Install virtualenv ```bash # upgrade pip to its latest version python3 -m pip install --upgrade pip # install virtualenv python3 -m pip install virtualenv ``` #### 2. Create and activate an isolated environment ```bash # create an isolated environment named 'myenv' (to adapt) virtualenv myenv # activate your isolated environment source myenv/bin/activate ``` Once activated, any python library you'll install using pip will be installed in this isolated environment, and python will only have access to these packages. Once you're done working on your project, simply type `deactivate` to exit the environment.

How to install SURFMAP

Go to the top

First, make sure you meet the system requirements outlined earlier and consider the recommendation. Then, follow instructions described in option 1 or 2 if you're not interested in accessing/modifying the source code, otherwise prefer option 3.

Option 1: from the archive (git not required)

First download an archive of our latest release here. ```bash # upgrade pip to its latest version python3 -m pip install --upgrade pip # install SURFMAP vx.x.x python3 -m pip install SURFMAP-x.x.x.zip # (or .tar.gz) ```

Option 2: from the version control systems

```bash # upgrade pip to its latest version python3 -m pip install --upgrade pip # install SURFMAP vx.x.x python -m pip install -e git+https://github.com/i2bc/SURFMAP.git@v2.1.0#egg=surfmap ```

Option 3: from this project repository

```bash # clone SURFMAP on your machine git clone https://github.com/i2bc/SURFMAP.git # go in the SURFMAP/ directory cd SURFMAP # upgrade pip to its latest version python3 -m pip install --upgrade pip # install SURFMAP python3 -m pip install -e . ```

How it works

Go to the top

SURFMAP workflow: inputs/outputs

The figure above represents the main steps of the SURFMAP worflow to compute the projection on a 2D map of a protein surface feature. More details about each step can be found in our article: see the [published version](https://pubs.acs.org/doi/10.1021/acs.jcim.1c01269) or its [free version](https://www.biorxiv.org/content/10.1101/2021.10.15.464543v1)


SURFMAP accepts as input either a PDB file or a text file in a SURFMAP-specific matrix format.

Using a PDB file as input is the most classic usage of SURFMAP. In this case, two outputs are generated:

The matrix text file contains all information about each projected surface residue and their associated feature value. As the above figure shows, this text file is the direct input for the last step of the SURFMAP workflow as it is read to generate the 2D map projection.

Using a text file in a SURFMAP-specific matrix format as input represents a special case that could be useful if the user wants to generate a 2D map from an internally pre-processed matrix, such as to normalize or average with other matrices.

Example of a SURFMAP-specific matrix format (.txt)
absc    ord     svalue  residues
5       5       Inf     NA
5       10      Inf     NA
5       15      Inf     NA
...
5       80      Inf     GLU_120_A
5       85      Inf     GLU_120_A, GLN_301_A
5       90      Inf     GLN_301_A
5       95      Inf     GLN_301_A
5       100     Inf     GLN_301_A
5       105     Inf     GLN_301_A
...
360 175 Inf NA
360 180 Inf NA

Calling SURFMAP with Docker or not

Whether you want to use SURFMAP through a Docker or not, the commands are almost exactly the same. Indeed, in order to use the Docker image of SURFMAP, you will just have to add the CLI option --docker. If you want to use SURFMAP from an installation on your local OS, then simply remove this option. For example:

# a command that will run on a Docker container
surfmap -pdb foo.pdb -tomap stickiness --docker

# the same command that will run on your local OS
surfmap -pdb foo.pdb -tomap stickiness

If the Docker image of SURFMAP is missing from your system, it will be automatically downloaded the first time you will execute a SURFMAP command.

:bell: The version of the SURFMAP Docker image used is the same as the version of SURFMAP you will have installed. You can check your current version with the command surfmap -v. Yet if you want to use another version of the SURFMAP Docker image, you will have to set a SURFMAP_DOCKER_VERSION environment variable with a value corresponding to an available tag version (e.g. export SURFMAP_DOCKER_VERSION=2.1.0).

Usage of SURFMAP

Go to the top

Once you have installed the SURFMAP package, you should be ready to use SURFMAP.

The example directory

To guide the user in the usage of SURFMAP, we will make use of files that you can find in the example/ directory of SURFMAP. You can see where this directory is located on your machine with the following command:

python3 -c "import surfmap; print(surfmap.PATH_TO_EXAMPLES)"

Please note that for all command examples illustrated below, we will make use of the Docker image of SURFMAP.

SURFMAP options

List of all SURFMAP options
usage: surfmap [-h] (-pdb PDB | -mat MAT | -v) -tomap TOMAP [-proj PROJ] [-res RES] [-rad RAD] [-d D] [-s S] [--nosmooth] [--png] [--keep]
               [--docker] [--pqr PQR] [-ff FF] [-verbose VERBOSE]

options:
  -h, --help        show this help message and exit
  -pdb PDB          Path to a PDB file
  -mat MAT          Input matrix. If the user gives an imput matrix, SURFMAP will directly compute a map from it.
  -v, --version     Print the current version of SURFMAP.
  -tomap TOMAP      Specific key of the feature to map. One of the following: stickiness, kyte_doolittle, wimley_white, electrostatics,
                    circular_variance, bfactor, binding_sites, all.
  -proj PROJ        Choice of the projection. Argument must be one of the following: flamsteed, mollweide, lambert. Defaults to flamsteed.
  -res RES          File containing a list of residues to map on the projection. Expected format has the following space/tab separated column
                    values: chainid resid resname
  -rad RAD          Radius in Angstrom added to usual atomic radius (used for calculation solvent excluded surface). The higher the radius the
                    smoother the surface. Defaults to 3.0
  -d D              Output directory where all files will be written. Defaults to './output_SURFMAP_$pdb_$tomap' with $pdb and $tomap based on
                    -pdb and -tomap given values
  -s S              Value defining the size of a grid cell. The value must be a multiple of 180. Defaults to 5.0.
  --elec-max-value ELEC_MAX_VALUE
                        Maximum value to be used for the electrostatics color scale. The value will be converted as an absolute value to make the scale symetric around 0. For
                        instance, a value of 5.63 will scale the electrosctatics color values from -5.63 to 5.63.
  --bfactor-min-value BFACTOR_MIN_VALUE
                        Minimum value to be used for the bfactor color scale.
  --bfactor-max-value BFACTOR_MAX_VALUE
                        Maximum value to be used for the bfactor color scale.
  --nosmooth        If chosen, the resulted maps are not smoothed (careful: this option should be used only for discrete values!)
  --png             If chosen, a map in png format is computed (default: only pdf format is generated)
  --keep            If chosen, all intermediary files are kept in the output (default: only final text matrix and pdf map are kept)
  --docker          If chosen, SURFMAP will be run on a docker container (requires docker installed).
  --pqr PQR         Path to a PQR file used for electrostatics calculation. Option only available if '-tomap electrosatics' is requested.
                    Defaults to None.
  -ff FF            Force-field used by pdb2pqr for electrostatics calculation. One of the following: AMBER, CHARMM, PARSE, TYL06, PEOEPB,
                    SWANSON. Defaults to CHARMM.
  -verbose VERBOSE  Verbose level of the console log. 0 for silence, 1 for info level, 2 for debug level. Defaults to 1.

Projection of a protein surface feature on a 2D map

In order to generate a 2D map projection of a protein surface feature, two inputs are required:

Valid feature key Feature details
kyte_doolittle Residue hydrophobicity directly derived from the Kyte-Doolittle scale (3)
wimley_white Residue hydrophobicity directly derived from the Wimley-White scale (4)
stickiness Propensity of each amino acid to be involved in protein−protein interfaces (5)
circular_variance Descriptor of the local (residue scale) geometry of a surface region: low values reflects protruding residues, while high values indicates residues located in cavities (6)
circular_variance_atom Descriptor of the local geometry (atomic scale) of a surface region: low values reflects protruding atoms, while high values indicates atoms located in cavities. (6)
electrostatics Electrostatic potential of the protein surface (atomic scale) - Requires the APBS software (2)
bfactor Any feature stored in the temperature factor of the input PDB file
all Compute sequentially the following features: kyte_doolittle, wimley_white, stickiness and circular_variance

From a PDB structure

# example - command to map the stickiness values for residues at the surface of the chain A of 1g3n.pdb
surfmap -pdb 1g3n_A.pdb -tomap stickiness --docker

The output has the following structure and content:

output_SURFMAP_1g3n_A_stickiness/
├── maps
│   └── 1g3n_A_stickiness_map.pdf
├── parameters.log
├── surfmap.log
└── smoothed_matrices
    └── 1g3n_A_stickiness_smoothed_matrix.txt

with:

Note on electrostatics calculations
The electrostatics potential is calculated through the use of APBS and is initially based on the generation of a PQR file which will contain the charge and radius of each atom in the input PDB file. In SURFMAP, this PQR file is generated through the use of [pdb2pqr](https://pdb2pqr.readthedocs.io/en/latest/) which reads atomic parameters from a force field accessible from its package itself. While the CHARMM force field is used by default in SURFMAP, all force fields accessible in [pdb2pqr](https://pdb2pqr.readthedocs.io/en/latest/) (AMBER, CHARMM, PARSE, TYL06, PEOEPB, SWANSON) can be used in SURFMAP with the `-ff` option. For example: ```bash # will use the CHARMM force-field (default) surfmap -pdb 1g3n_A.pdb -tomap electrostatics --docker # will use the AMBER force-field surfmap -pdb 1g3n_A.pdb -tomap electrostatics -ff AMBER --docker ``` For the particular case where a user would like to compute electrostatics potential with any other force-field (e.g. for a coarse-grained PDB file), SURFMAP can be used with the additional option `-pqr` that must be followed with a PQR file generated by the user himself. For example: ```bash # will read atomic parameters from the PQR given as input surfmap -pdb structure.pdb -tomap electrostatics -pqr structure.pqr --docker ```

From a SURFMAP matrix file

A matrix written in a SURFMAP-specific format can also be used as an input to generate a 2D map. The feature to map has to be the same as the one used to generate the matrix file. As a fancy usage example, the command below will reproduce the 2D map generated from the command above:

# example - command to create a map from a SURFMAP matrix file generated with stickiness values
surfmap -mat output_SURFMAP_1g3n_A_stickiness/smoothed_matrices/1g3n_A_stickiness_smoothed_matrix.txt -tomap stickiness --docker

A more realistic usage of this option would be to compute maps from your internally pre-processed matrices. For example you may have generated 2D maps of a same protein in different conformational states and then may want to compute an averaged matrix file (please note that we don't provide such script utilities).

Example of a SURFMAP-specific matrix format (.txt)
absc    ord     svalue  residues
5       5       Inf     NA
5       10      Inf     NA
5       15      Inf     NA
...
5       80      Inf     GLU_120_A
5       85      Inf     GLU_120_A, GLN_301_A
5       90      Inf     GLN_301_A
5       95      Inf     GLN_301_A
5       100     Inf     GLN_301_A
5       105     Inf     GLN_301_A
...
360 175 Inf NA
360 180 Inf NA

Projection of interface residues on a 2D map

Instead of projecting a protein surface feature on a 2D map, you may be interested in the projection of interface residues. This is possible with the option -tomap binding_sites of SURFMAP.

With the -tomap binding_sites option, a discrete color scale is used to associate one color to each different value found in the b-factor column. So in order to use this option, your input PDB file must contain discrete values in the b-factor column for each atoms, the value depending on whether the atoms belong to an interface or not. For example:

We provide two utility scripts to help users generating a PDB file that can be used with the -tomap binding_sites option of SURFMAP:

Usage of extract_interface

From multi-chain PDB file, the command extract_interface will find the interface residues between a given chain (or set of chains) and all the other chains of the input PDB structure. It will then output a new PDB file of the given chain(s) with the expected format for the -tomap binding_sites option.

The command below illustrates the usage of extract_interface with the PDB file 1g3n_ABC.pdb in the example directory.

# generate a PDB file of the chain A in which the b-factor column will contain a discrete value for each different interface residues that will be found between chains A and B, and chains A and C
extract_interface -pdb 1g3n_ABC.pdb -chains A

It will generate two output files:

So now, we can map interface residues of the chain A of 1G3N:

# Use the PDB file generated with the command above to project labelled residues on a 2D map 
surfmap -pdb 1g3n_ABC_chain-A_bs.pdb -tomap binding_sites --docker
Usage of write_pdb_bs

The command write_pdb_bs is made to avoid the manual editing of the b-factor column of a PDB file that you would like to use with the -tomap binding_sites option. The command takes as inputs:

The text file listing interface residues must be formatted as follows:

Example of a text file listing interface residues
A  14  GLU 1
A   15  CYS 1
A   16  VAL 1
...
A   155 SER 1
A   156 SER 1
A   47  VAL 2
A   49  THR 2
A   50  GLY 2
...
A   139 HIS 2
A   140 ARG 2
A   292 TYR 2

As a fancy example, the command below will reproduce the PDB file 1g3n_ABC_chain-A_bs.pdb ready for use by surfmap with the option -tomap binding_sites:

write_pdb_bs -pdb 1g3n_ABC_chain-A_bs.pdb -res 1g3n_ABC_chain-A_interface.txt

The output file will have the basename of the PDB file given as input with the suffix _bs.pdb

Supporting the project

Go to the top

Contacts

Go to the top

If you have any question regarding SURFMAP, you can contact us:

Licence

Go to the top

This project is under the MIT License terms. Please have a look at the LICENSE file for more details.

How to cite SURFMAP

Go to the top

If SURFMAP has been useful to your research, please cite us as well as the original MSMS paper:

Hugo Schweke, Marie-Hélène Mucchielli, Nicolas Chevrollier, Simon Gosset, Anne Lopes. SURFMAP: a software for mapping in two dimensions protein surface features. J. Chem. Inf. Model. 2022. Link

Sanner, M. F., Olson A.J. & Spehner, J.-C. (1996). Reduced Surface: An Efficient Way to Compute Molecular Surfaces. Biopolymers 38:305-320. Link

Moreover, if you use APBS in your research, please cite one or more of the following papers listed in the Supporting APBS documentation page.

References

Go to the top

(1) Michel Sanner, Arthur J. Olson, Jean Claude Spehner (1996). Reduced Surface: an Efficient Way to Compute Molecular Surfaces. Biopolymers, Vol 38, (3), 305-320.

(2) Jurrus E, Engel D, Star K, Monson K, Brandi J, Felberg LE, Brookes DH, Wilson L, Chen J, Liles K, Chun M, Li P, Gohara DW, Dolinsky T, Konecny R, Koes DR, Nielsen JE, Head-Gordon T, Geng W, Krasny R, Wei GW, Holst MJ, McCammon JA, Baker NA. Improvements to the APBS biomolecular solvation software suite. Protein Science, 27, 112-128, 2018.

(3) Kyte, J.; Doolittle, R. F. A Simple Method for Displaying the Hydropathic Character of a Protein. J. Mol. Biol. 1982, 157 (1), 105−132.

(4) Wimley, W. C.; White, S. H. Experimentally Determined Hydrophobicity Scale for Proteins at Membrane Interfaces. Nat. Struct. Biol. 1996, 3 (10), 842−848.

(5) Levy, E. D. A Simple Definition of Structural Regions in Proteins and Its Use in Analyzing Interface Evolution. J. Mol. Biol. 2010, 403 (4), 660−670.

(6) Mezei, M. A New Method for Mapping Macromolecular Topography. J. Mol. Graph. Model 2003, 21 (5), 463−472.