EM-GAN is a computational tool, which enables capturing protein structure information from cryo-EM maps more effectively than raw maps. It is based on 3D deep learning. It is aimed to help protein structure modeling from cryo-EM maps.
Copyright (C) 2021 Sai Raghavendra Maddhuri Venkata Subramaniya, Genki Terashi, Daisuke Kihara, and Purdue University.
License: GPL v3 for academic use. (For commercial use, please contact us for different licensing)
Contact: Daisuke Kihara (dkihara@purdue.edu)
Cite : Sai Raghavendra Maddhuri Venkata Subramaniya, Genki Terashi & Daisuke Kihara. Improved Protein Structure Modeling Using Enhanced Cryo-EM Maps With 3D Deep Generative Networks. Bioinformatics, in press (2023).
Google Colab: https://tinyurl.com/3ccxpttx
An increasing number of biological macromolecules have been solved with cryo-electron microscopy (cryo-EM). Over the past few years, the resolutions of density maps determined by cryo-EM have largely improved in general. However, there are still many cases where the resolution is not high enough to model molecular structures with standard computational tools. If the resolution obtained is near the empirical border line (3 - 4 Å), improvement in the map quality facilitates improved structure modeling. Here, we report that protein structure modeling can often be substantially improved by using a novel deep learning-based method that prepares an input cryo-EM map for modeling. The method uses a three-dimensional generative adversarial network, which learns density patterns of high and low-resolution density maps.
GAN architecture of EM-GAN is shown below.
Python 3 : https://www.python.org/downloads/
pytorch : pip/conda install pytorch
mrcfile==1.2.0
numpy>=1.19.4
numba>=0.52.0
torch>=1.6.0
scipy>=1.6.0
This software is free to use under GPL v3 for academic use. For commercial use, please contact us for different licensing.
Please allow 30 mins on average to get the output, since 3D input processing and inferencing takes some time. Our running time is directly correlated to the size of the structures. For example, a map with 260 260 260 can take 2 hours to finish.
OS: Any (e.g CentOS, Linux, Windows, Mac).
Necessary libararies: Please refer to the dependencies above and make sure that they're installed.
GPU: Optional (Any GPU with >4GB RAM should enable faster computation).
data_prep/HLmapData -a [sample_mrc] -b [sample_mrc] [options] > [output_trimmap_filename]
INPUTS:
HLmapData_new expects sample_mrc to be a valid filename. Supported file formats are Situs, CCP4, and MRC2000. Format is deduced from FILE's extension.
OPTIONS:
-a [mrc] Input map file of the experimental map.
-b [mrc] Input map file of the experimental map (Same map as above). If you have a simulated map available and are validating, specify that instead
-A [float] The level of isosurface to generate density values for the first map (map specified with option -a). You can use the author recommended contour level for experimental EM maps. default=0.0
-B [float] The level of isosurface to generate density values for the first map (map specified with option -b) You can use the author recommended contour level for experimental EM maps. If input is simulated map, specify 0.0 default=0.0
-w [integer] This option sets the dimensions of sliding cube used for input data generation. The size of the cube is calculated as 2*w+1. We recommend using a value of 12 for this option that generates input cube of size 25*25*25. Please be mindful while increasing this option as it increases the portion of an EM map a single cube covers. Increasing this value also increases running time. default=5 (->11x11x11)
-s [integer] This option specifies the stride value to be used while generating input cubes We recommend using a value of 4 for this option. Increasing this value also increases running time. default=1
-h, --help, -?, /? Displays the list of above options.
USAGE: data_prep/HLmapData -a protein.mrc -b protein.mrc -A-B -w 12 -s 4 > protein_trimmap
python data_prep/generate_input.py [sample_trimmap] [
_data] [dataset_folder]
INPUTS: Inputs to this script are trimmap generated in the previous step, ID is a unique identifier of a map such as EMID, and dataset_folder which is a folder to write dataset files.
USAGE: python data_prep/generate_input.py protein_trimmap 1_data ./data_dir/
python test.py --dir_path=INPUT_DATA_DIR --res_blocks=5 --batch_size=128 --in_channels=32 --G_path=GENERATOR_MODEL_PATH --D_path=DISCRIMINATOR_MODEL_PATH
INPUT: --dir_path Path to data directory created in the last step --G_path Specify path of Generator model --D_path Specify path of Discriminator model
OUTPUT: This program writes output modified em map cubes to the same directory as input. USAGE: python test.py --res_blocks=5 --batch_size=128 --in_channels=32 --G_path=model/G_model --D_path=model/D_model --dir_path=data_dir/
data_prep/HLmapData -a 2788.mrc -b 2788.mrc -A 0.16 -B 0.16 -w 12 -s 4 > 2788_trimmap
python data_prep/generate_input.py 2788_trimmap 2788_data ./data_dir
Density Map, 2788.mrc
GAN-modified Map, 2788_SR.mrc