SURFMAP is a CLI tool that requires a UNIX-based OS system. It is written in python (version 3.7), R (version 3.6). It relies on the already included MSMS software (1) and may optionally require APBS (2) if the user wants to perform electrostatics calculations.
All those requirements (including APBS) are met in a predefined Docker image that we recommend the user to use.
:bell: Please note that whether you want to use the Docker image of SURFMAP or not, you will still need to install the SURFMAP package. Indeed the package contains internal features that make the use of the Docker image totally transparent for the user who will not have to enter 'complex' commands for the connection of useful mounting points. In fact, the SURFMAP commands are almost exactly the same between the use of the docker image or not (see here).
We strongly recommend that you install the SURFMAP package and its python dependencies in an isolated environment. Click in the section below for a short illustration on why and how to use an isolated environment.
By using an isolated environment you will avoid potential version conflicts between python libraries when working on different projects. Some of the most popular tools to work with isolated python environments are [virtualenv](https://pypi.org/project/virtualenv/), [pyenv](https://pypi.org/project/pyenv/), [pipenv](https://pypi.org/project/pipenv/).
Below is an example on how to use [virtualenv](https://pypi.org/project/virtualenv/). #### 1. Install virtualenv ```bash # upgrade pip to its latest version python3 -m pip install --upgrade pip # install virtualenv python3 -m pip install virtualenv ``` #### 2. Create and activate an isolated environment ```bash # create an isolated environment named 'myenv' (to adapt) virtualenv myenv # activate your isolated environment source myenv/bin/activate ``` Once activated, any python library you'll install using pip will be installed in this isolated environment, and python will only have access to these packages. Once you're done working on your project, simply type `deactivate` to exit the environment.First, make sure you meet the system requirements outlined earlier and consider the recommendation. Then, follow instructions described in option 1 or 2 if you're not interested in accessing/modifying the source code, otherwise prefer option 3.
SURFMAP accepts as input either a PDB file or a text file in a SURFMAP-specific matrix format.
Using a PDB file as input is the most classic usage of SURFMAP. In this case, two outputs are generated:
The matrix text file contains all information about each projected surface residue and their associated feature value. As the above figure shows, this text file is the direct input for the last step of the SURFMAP workflow as it is read to generate the 2D map projection.
Using a text file in a SURFMAP-specific matrix format as input represents a special case that could be useful if the user wants to generate a 2D map from an internally pre-processed matrix, such as to normalize or average with other matrices.
absc ord svalue residues 5 5 Inf NA 5 10 Inf NA 5 15 Inf NA ... 5 80 Inf GLU_120_A 5 85 Inf GLU_120_A, GLN_301_A 5 90 Inf GLN_301_A 5 95 Inf GLN_301_A 5 100 Inf GLN_301_A 5 105 Inf GLN_301_A ... 360 175 Inf NA 360 180 Inf NA
Whether you want to use SURFMAP through a Docker or not, the commands are almost exactly the same. Indeed, in order to use the Docker image of SURFMAP, you will just have to add the CLI option --docker
. If you want to use SURFMAP from an installation on your local OS, then simply remove this option. For example:
# a command that will run on a Docker container
surfmap -pdb foo.pdb -tomap stickiness --docker
# the same command that will run on your local OS
surfmap -pdb foo.pdb -tomap stickiness
If the Docker image of SURFMAP is missing from your system, it will be automatically downloaded the first time you will execute a SURFMAP command.
:bell: The version of the SURFMAP Docker image used is the same as the version of SURFMAP you will have installed. You can check your current version with the command
surfmap -v
. Yet if you want to use another version of the SURFMAP Docker image, you will have to set aSURFMAP_DOCKER_VERSION
environment variable with a value corresponding to an available tag version (e.g.export SURFMAP_DOCKER_VERSION=2.1.0
).
Once you have installed the SURFMAP package, you should be ready to use SURFMAP.
To guide the user in the usage of SURFMAP, we will make use of files that you can find in the example/
directory of SURFMAP. You can see where this directory is located on your machine with the following command:
python3 -c "import surfmap; print(surfmap.PATH_TO_EXAMPLES)"
Please note that for all command examples illustrated below, we will make use of the Docker image of SURFMAP.
usage: surfmap [-h] (-pdb PDB | -mat MAT | -v) -tomap TOMAP [-proj PROJ] [-res RES] [-rad RAD] [-d D] [-s S] [--nosmooth] [--png] [--keep] [--docker] [--pqr PQR] [-ff FF] [-verbose VERBOSE] options: -h, --help show this help message and exit -pdb PDB Path to a PDB file -mat MAT Input matrix. If the user gives an imput matrix, SURFMAP will directly compute a map from it. -v, --version Print the current version of SURFMAP. -tomap TOMAP Specific key of the feature to map. One of the following: stickiness, kyte_doolittle, wimley_white, electrostatics, circular_variance, bfactor, binding_sites, all. -proj PROJ Choice of the projection. Argument must be one of the following: flamsteed, mollweide, lambert. Defaults to flamsteed. -res RES File containing a list of residues to map on the projection. Expected format has the following space/tab separated column values: chainid resid resname -rad RAD Radius in Angstrom added to usual atomic radius (used for calculation solvent excluded surface). The higher the radius the smoother the surface. Defaults to 3.0 -d D Output directory where all files will be written. Defaults to './output_SURFMAP_$pdb_$tomap' with $pdb and $tomap based on -pdb and -tomap given values -s S Value defining the size of a grid cell. The value must be a multiple of 180. Defaults to 5.0. --elec-max-value ELEC_MAX_VALUE Maximum value to be used for the electrostatics color scale. The value will be converted as an absolute value to make the scale symetric around 0. For instance, a value of 5.63 will scale the electrosctatics color values from -5.63 to 5.63. --bfactor-min-value BFACTOR_MIN_VALUE Minimum value to be used for the bfactor color scale. --bfactor-max-value BFACTOR_MAX_VALUE Maximum value to be used for the bfactor color scale. --nosmooth If chosen, the resulted maps are not smoothed (careful: this option should be used only for discrete values!) --png If chosen, a map in png format is computed (default: only pdf format is generated) --keep If chosen, all intermediary files are kept in the output (default: only final text matrix and pdf map are kept) --docker If chosen, SURFMAP will be run on a docker container (requires docker installed). --pqr PQR Path to a PQR file used for electrostatics calculation. Option only available if '-tomap electrosatics' is requested. Defaults to None. -ff FF Force-field used by pdb2pqr for electrostatics calculation. One of the following: AMBER, CHARMM, PARSE, TYL06, PEOEPB, SWANSON. Defaults to CHARMM. -verbose VERBOSE Verbose level of the console log. 0 for silence, 1 for info level, 2 for debug level. Defaults to 1.
In order to generate a 2D map projection of a protein surface feature, two inputs are required:
-pdb
option) OR a matrix text file written in a SURFMAP-specific format (-mat
option)Valid feature key | Feature details |
---|---|
kyte_doolittle |
Residue hydrophobicity directly derived from the Kyte-Doolittle scale (3) |
wimley_white |
Residue hydrophobicity directly derived from the Wimley-White scale (4) |
stickiness |
Propensity of each amino acid to be involved in protein−protein interfaces (5) |
circular_variance |
Descriptor of the local (residue scale) geometry of a surface region: low values reflects protruding residues, while high values indicates residues located in cavities (6) |
circular_variance_atom |
Descriptor of the local geometry (atomic scale) of a surface region: low values reflects protruding atoms, while high values indicates atoms located in cavities. (6) |
electrostatics |
Electrostatic potential of the protein surface (atomic scale) - Requires the APBS software (2) |
bfactor |
Any feature stored in the temperature factor of the input PDB file |
all |
Compute sequentially the following features: kyte_doolittle , wimley_white , stickiness and circular_variance |
# example - command to map the stickiness values for residues at the surface of the chain A of 1g3n.pdb
surfmap -pdb 1g3n_A.pdb -tomap stickiness --docker
The output has the following structure and content:
output_SURFMAP_1g3n_A_stickiness/ ├── maps │ └── 1g3n_A_stickiness_map.pdf ├── parameters.log ├── surfmap.log └── smoothed_matrices └── 1g3n_A_stickiness_smoothed_matrix.txt
with:
parameters.log
: a summary of the parameters used to compute the mapsurfmap.log
: a log file of each of the step of the SURFMAP workflow1g3n_A_stickiness_map.pdf
: the generated 2D map in PDF format1g3n_A_stickiness_smoothed_matrix.txt
: a computed smoothed matrix file (txt file) used to generate the 2D map. This matrix has the expected format of a matrix file that can be used as a direct input of SURFMAP through the used of the -mat
argument.A matrix written in a SURFMAP-specific format can also be used as an input to generate a 2D map. The feature to map has to be the same as the one used to generate the matrix file. As a fancy usage example, the command below will reproduce the 2D map generated from the command above:
# example - command to create a map from a SURFMAP matrix file generated with stickiness values
surfmap -mat output_SURFMAP_1g3n_A_stickiness/smoothed_matrices/1g3n_A_stickiness_smoothed_matrix.txt -tomap stickiness --docker
A more realistic usage of this option would be to compute maps from your internally pre-processed matrices. For example you may have generated 2D maps of a same protein in different conformational states and then may want to compute an averaged matrix file (please note that we don't provide such script utilities).
absc ord svalue residues 5 5 Inf NA 5 10 Inf NA 5 15 Inf NA ... 5 80 Inf GLU_120_A 5 85 Inf GLU_120_A, GLN_301_A 5 90 Inf GLN_301_A 5 95 Inf GLN_301_A 5 100 Inf GLN_301_A 5 105 Inf GLN_301_A ... 360 175 Inf NA 360 180 Inf NA
Instead of projecting a protein surface feature on a 2D map, you may be interested in the projection of interface residues. This is possible with the option -tomap binding_sites
of SURFMAP.
With the -tomap binding_sites
option, a discrete color scale is used to associate one color to each different value found in the b-factor column. So in order to use this option, your input PDB file must contain discrete values in the b-factor column for each atoms, the value depending on whether the atoms belong to an interface or not. For example:
0
for atoms that are not part of any binding sites1
for atoms being part of one known binding site2
for atoms being part of a second binding site (if there is)...
We provide two utility scripts to help users generating a PDB file that can be used with the -tomap binding_sites
option of SURFMAP:
extract_interface
write_pdb_bs
From multi-chain PDB file, the command extract_interface
will find the interface residues between a given chain (or set of chains) and all the other chains of the input PDB structure. It will then output a new PDB file of the given chain(s) with the expected format for the -tomap binding_sites
option.
The command below illustrates the usage of extract_interface
with the PDB file 1g3n_ABC.pdb
in the example directory.
# generate a PDB file of the chain A in which the b-factor column will contain a discrete value for each different interface residues that will be found between chains A and B, and chains A and C
extract_interface -pdb 1g3n_ABC.pdb -chains A
It will generate two output files:
1g3n_ABC_chain-A_bs.pdb
: a PDB file ready for use by the command surfmap
with the option -tomap binding_sites
.1g3n_ABC_chain-A_interface.txt
: a text file containing information about identified interface residues. This file can be edited and used as input for the command write_pdb_bs
described below.So now, we can map interface residues of the chain A of 1G3N:
# Use the PDB file generated with the command above to project labelled residues on a 2D map
surfmap -pdb 1g3n_ABC_chain-A_bs.pdb -tomap binding_sites --docker
The command write_pdb_bs
is made to avoid the manual editing of the b-factor column of a PDB file that you would like to use with the -tomap binding_sites
option. The command takes as inputs:
The text file listing interface residues must be formatted as follows:
A 14 GLU 1 A 15 CYS 1 A 16 VAL 1 ... A 155 SER 1 A 156 SER 1 A 47 VAL 2 A 49 THR 2 A 50 GLY 2 ... A 139 HIS 2 A 140 ARG 2 A 292 TYR 2
As a fancy example, the command below will reproduce the PDB file 1g3n_ABC_chain-A_bs.pdb
ready for use by surfmap
with the option -tomap binding_sites
:
write_pdb_bs -pdb 1g3n_ABC_chain-A_bs.pdb -res 1g3n_ABC_chain-A_interface.txt
The output file will have the basename of the PDB file given as input with the suffix _bs.pdb
If you have any question regarding SURFMAP, you can contact us:
This project is under the MIT License terms. Please have a look at the LICENSE file for more details.
If SURFMAP has been useful to your research, please cite us as well as the original MSMS paper:
Hugo Schweke, Marie-Hélène Mucchielli, Nicolas Chevrollier, Simon Gosset, Anne Lopes. SURFMAP: a software for mapping in two dimensions protein surface features. J. Chem. Inf. Model. 2022. Link
Sanner, M. F., Olson A.J. & Spehner, J.-C. (1996). Reduced Surface: An Efficient Way to Compute Molecular Surfaces. Biopolymers 38:305-320. Link
Moreover, if you use APBS in your research, please cite one or more of the following papers listed in the Supporting APBS documentation page.
(1) Michel Sanner, Arthur J. Olson, Jean Claude Spehner (1996). Reduced Surface: an Efficient Way to Compute Molecular Surfaces. Biopolymers, Vol 38, (3), 305-320.
(2) Jurrus E, Engel D, Star K, Monson K, Brandi J, Felberg LE, Brookes DH, Wilson L, Chen J, Liles K, Chun M, Li P, Gohara DW, Dolinsky T, Konecny R, Koes DR, Nielsen JE, Head-Gordon T, Geng W, Krasny R, Wei GW, Holst MJ, McCammon JA, Baker NA. Improvements to the APBS biomolecular solvation software suite. Protein Science, 27, 112-128, 2018.
(3) Kyte, J.; Doolittle, R. F. A Simple Method for Displaying the Hydropathic Character of a Protein. J. Mol. Biol. 1982, 157 (1), 105−132.
(4) Wimley, W. C.; White, S. H. Experimentally Determined Hydrophobicity Scale for Proteins at Membrane Interfaces. Nat. Struct. Biol. 1996, 3 (10), 842−848.
(5) Levy, E. D. A Simple Definition of Structural Regions in Proteins and Its Use in Analyzing Interface Evolution. J. Mol. Biol. 2010, 403 (4), 660−670.
(6) Mezei, M. A New Method for Mapping Macromolecular Topography. J. Mol. Graph. Model 2003, 21 (5), 463−472.