This project was designed to benchmark results of CE-Symm
.
Please see this project (https://github.com/rcsb/symmetry) for more information about the CE-Symm
tool.
Users may be interested in this project to:
CE-Symm
.CE-Symm
paper.CE-Symm
.Version 1.0:
Myers-Turnbull, D., Bliven, S. E., Rose, P. W., Aziz, Z. K., Youkharibache, P., Bourne, P. E., & Prlić, A. (2014). Systematic Detection of Internal Symmetry in Proteins Using CE-Symm. Journal of Molecular Biology, 426(11), 2255–2268.
Version 2.0:
Bliven, S.E., Lafita, A., Rose, P.W., Capitani, G., Prlic, A., Bourne, P.E. (2018) Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. Submitted. Biorxiv preprint: https://doi.org/10.1101/297960
The CE-Symm algorithm was originally benchmarked on a novel manually curated set of 1007 proteins. These are annoted with various types of structural repeats, primarily internal symmetry.
All files are contained within directory domain_symm_benchmark
.
The Myers-Turnbull benchmark is contained in file domain_symm_benchmark.tsv
. The original file can be accessed from the symmetry-benchmark-1.0.0 tag, and can be used to exactly duplicate the results from the paper (together with the CE-Symm 1.0.0 Release). Later releases reflect changes in the PDB (e.g. obsolete entries) or the discovery of mistakes in the manual curation (e.g. overlooked translational repeats). The structures of all domains in PDB format are found in domain_symm_benchmark.tgz
.
Each line of the file consists of a SCOPe domain identifier (v. 2.01) and an annotation of the symmetry. Abbreviations used:
Many of the cases in the benchmark are difficult to classify or fall near the border of two categories. The following guidelines were used for difficult cases:
See repeatsdb-lite directory for more information.
Guerler, A., Wang, C., & Knapp, E. W. (2009). Symmetric structures in the universe of protein folds. Journal of Chemical Information and Modeling, 49(9), 2147–2151.
The GANGSTA+ algorithm for detecting internal symmetry was run on ASTRAL40 (SCOP v. 1.73) and identified a number of families with significant internal symmetry. Both SymD and CE-Symm also used the dataset, making it a useful tool for comparing algorithms.
Kim et al. provide a list of SCOP domains for each symmetric fold in their Supplemental Material 3:
Kim, C., Basner, J., & Lee, B. (2010). Detecting internally symmetric protein structures. BMC Bioinformatics, 11, 303.
This data has been reformatted into a more machine-readable format in the
Guerler_folds/
directory. We were unable to reconcile some differences in
the number of domains with those given in Kim Table 2. The data here can be
used to reproduce the CE-Symm results from Myers-Turnbull Table 1, but they may
not reproduce exactly the previously reported SymD and GANGSTA+ results.
Format:
Guerler_folds/expected_groups.tsv
contains a list of SCOP folds and families,
annotated with the expected type of symmetry.
Guerler_folds/*.list
contains a list of SCOP domains belonging to the fold
given in the filename.
Fischer, D., Elofsson, A., Rice, D., & Eisenberg, D. (1996). Assessing the performance of fold recognition methods by means of a comprehensive benchmark. Pacific Symposium on Biocomputing Pacific Symposium on Biocomputing, 300–318.
The Fischer benchmark consists of 68 pairs of proteins from related folds. While not directly related to symmetry, the list of proteins is provided here in machine-readable format for benchmarking structural comparison algorithms.
Brunette, T. J., Parmeggiani, F., Huang, P.-S., Bhabha, G., Ekiert, D. C., Tsutakawa, S. E., Hura, G. L., Tainer, J. A., and Baker, D (2015). Exploring the repeat protein universe through computational protein design. Nature, 528(7583), 580–584.
A set of 15 designed solenoid proteins with varying curvature and twist.