meetU-MasterStudents / 2019---2020-partage

For exchanging material and doc
2 stars 3 forks source link

2019---2020-partage

For exchanging material and doc


Bibliography (Biblio)

For profile-profile alignments and (remote) homology detection

ORION: fold recognition method based on profile-profile alignments

scoringProfiles: methods to score profile-profile alignments

HHsearch: remote homology detection using profiles (defined from hidden Markov models, not from PSSM!)

CLADE: remote homology detection using a multi-profile strategy

For evaluating the quality of a 3D model

DOPE: "historical" statistical potential for evaluating the quality of 3D models (does not perform very well but is a good start)

Rosetta: a reference function for estimating the energy of a 3D protein conformation (be aware it's all-atom!)

SBROD: coarse-grained statistical potential to evaluate 3D models quality (can be applied to Calpha-only or backbone-only structures, recently performed well in CASP13, can be completely re-trained!)

For structural annotations (secondary structure, solvent accessibility, contacts... predicted from sequence)

ReviewSA: book chapter reviewing methods to predict secondary structure, solvent accessibility, torsional angles and contact maps, from sequence information)

CCMPRED: method to predict protein-protein contact by extracting coevolution signals (co-occurring patterns of mutations across sequences)

For sructural refinement

DaReUS-Loop: Webserver for accurate modeling of loops in homology models


Helper data and tools (Codes)

Params: directory containing values for the DOPE statistical potential.

Tools: directory containing a PDB parser (in Python) and a script to weight sequences based on their similarity (in Perl).


Data for training and testing (Data)

1009 directories corresponding to 1009 families from HOMSTRAD.

For each family:

Please note that the master sequence is named by its PDB code in the MAP file. It may not be the first sequence appearing in the file!

Please note that you may have multiple SCOP identifiers for one family. For your evaluation, in case of multiple folds or superfamilies, you can consider that if you found at least one of them your answer is valid. Please see http://scop.mrc-lmb.cam.ac.uk/scop/parse/dir.cla.scop.txt_1.75 for a complete match between PDBs and SCOP ids.

398 test sequences ("queries") to validate your program.

They are contained in the file queries398.multifasta. The name of each query sequence is as follows:

gluts | Q09596.1 | NP_001254267.1 | 98.0%