meetU-MasterStudents / 2019---2020-partage

For exchanging material and doc
2 stars 3 forks source link

Problème avec les scop_id dans HOMSTRAD #24

Open thms3 opened 4 years ago

thms3 commented 4 years ago

Je fais partie de l'équipe 2 (aval) et on rencontre un problème que l'on ne peut expliquer. J'ai mis ci-joint une image d'un dictionnaire que j'ai réalisé sur le dossier HOMSTRAD et il s'avère qu'une même protéine présente plusieurs scop_id, ce qui semble quelque peu en contradiction avec la classification hiérarchique de SCOP, sachant que ce ne sont pas des sous-noeuds indifférenciés entre eux (limite de prédiction/PDB ... ) mais des racines de l'arbre de classification qui diffèrent ....

Donc la question est la suivante : quel scop_id prendre ?

Capture d’écran 2019-11-27 à 12 07 40

elolaine commented 4 years ago

Bonjour,

C'est tout à fait normal. Cf l'issue n°15, que je vous remets ici. Je vais mette les info dans le README du dossier Partage aussi.


I just updated the SCOP ids, they were wrong! Please take the new ones. You will notice that there are more than one SCOP ids for 46 families. This can be due to 3 reasons:

(1) The reference PDB contains several domains which have different SCOP ids (example: 5_3 endonuclease a.60.7.1 , c.120.1.2),

(2) The PDBs associated to the family (reference PDB + other PDBs whose codes are indicated in the MAP file) have different SCOP ids, although each one of them covers the whole query (example: hexapep b.81.1.5 , b.81.1.1 , b.81.1.2),

(3) The PDBs associated to the family (reference PDB + other PDBs whose codes are indicated in the MAP file) have different SCOP ids, and they do not cover the same parts of the query (example: fer4 d.58.1.4 , i.4.1.1 , d.58.1.5 , d.58.1.1 , d.58.1.2 , d.58.1.3).

In case (2), the different SCOP_ids actually correspond to very similar structures. In cases (1) and (3), they can correspond to different structures. This is clearly shown by the examples given in parentheses.

To evaluate your results, when you place yourself at the level of the families, you can simply rely on the family names and not consider the SCOP ids. You want to see at which rank the real HOMSTRAD family of the query is. When you consider higher levels like SCOP superfamilies and folds, you can consider that you found a valid HOMSTRAD family when it shares at least one SCOP superfamily/fold with the real HOMSTRAD family of the query.

For those of you who are interested in knowing exactly which of the reference+other PDBs have which SCOP_id, here's a file containing such information: http://scop.mrc-lmb.cam.ac.uk/scop/parse/dir.cla.scop.txt_1.75.