soedinglab / CCMpred

Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.
http://www.ncbi.nlm.nih.gov/pubmed/25064567
GNU Affero General Public License v3.0
107 stars 25 forks source link

reading output #13

Closed FloraMika closed 5 years ago

FloraMika commented 5 years ago

Hi, CCMpred is running in my case but we have an issue: We mentionned that in the output matrix have the dimention of the studied sequence + gaps. How is it possible that a score is calculated between a gap and a residue? From which residue can we considere the first amino acid?

I am waiting for your answer,

Best regards,

Flora

croth1 commented 5 years ago

Hi @FloraMika,

We mentionned that in the output matrix have the dimention of the studied sequence + gaps.

The output matrix is a square matrix LxL, where L is the number of columns in the alignment. Each value gives a coupling score between each pair of positions i and j in the alignment.

How is it possible that a score is calculated between a gap and a residue?

The couplings in the matrix scores M(i,j) are scores for the two columns i and j. Sequences that have a gap at position i or j do not contribute to the coupling score M(i,j).

That being said, internally gaps are not special. Gaps can be treated as a 21st amino acid when calculating the coupling parameters of the model.

From which residue can we considere the first amino acid?

Can you help me understanding the question by giving a bit more context here?

Best, Christian

FloraMika commented 5 years ago

Dear @croth1, Thank you really much for your answer, I manage to parse the output. I am trying to compare the best top coupling residues with a distance matrix of amino acids to see if the top coupling correspond to distance < 8 A. I am still not sure how to obtain a final score (how to weight correct correlation). Best regards,

Flora

croth1 commented 5 years ago

I am still not sure how to obtain a final score

The output file contains the matrix M[i,j] that is corrected with APC (see ccmpred -h). The values in M are your ranking scores. There are different ways in the literature that is used to evaluate the quality of predictions. If you are interested it is best if you directly refer to the relevant papers.

Best, Christian