meetU-MasterStudents / 2019---2020-partage

For exchanging material and doc
2 stars 3 forks source link

Data Uploaded #12

Open cgpapado opened 4 years ago

cgpapado commented 4 years ago

Hello evryone, I just uploaded your benchmarks so that you can start testing the performance of your methods. I remind you:

Upstream Inputs: 399 amino acid sequences more or less homologs to your 1010 HOMSTRAD families

Downstream Inputs : 399 foldrec files genereted by ORION software by the 399 queries given to Downstream teams.

The name of each file has the following format: {sequence UNIPROT code} UNDERSCORE fam UNDERSCORE {HOMSTRAD family name}.fasta {sequence UNIPROT code} UNDERSCORE fam UNDERSCORE {HOMSTRAD family name}.foldrec

In the dataset you will find two types of sequences: (*) Sequences homologous to the MASTER sequence of the HOMSTRAD family (**) Sequences homologous to another sequence (NOT THE MASTER) of the HOMSTRAD family.

The (*) cases are easier than the (**) cases because they are closer to the master.
So we expect less gaps at the (*) sequences' alignments with the master rather than the (**) cases. 
This is important for the Downstream teams which have to handle with the gaps on their 3D models. 
You will find this infomation at the last column of your fastas' title. 

Should be noted that the identity percentage for the (**) cases is the identity with their respective homologous sequence and NOT with the master of HOMSTRAD family. 
So for exemple a (**) sequence of 95% identity could have 75% identity with the master of the family. Do not get surpised if this sequence has a lot of gaps!!!

Have a nice WE. Chris