soedinglab / CCMpred

Protein Residue-Residue Contacts from Correlated Mutations predicted quickly and accurately.
http://www.ncbi.nlm.nih.gov/pubmed/25064567
GNU Affero General Public License v3.0
107 stars 25 forks source link

contact output parse #18

Closed 595693085 closed 4 years ago

595693085 commented 5 years ago

Hello, The output i want is a protein crotact map. In more detail, for a protein whose sequence length is L and a threshold (for example, 8 Å),what i want is a M= L * L matrix , and the value of M(i,j) is 0 or 1, which indicates that wheather the residue i and residue j are in contact according to the threshold. Is there a way to convert the output score matex to such a contact map? Thanks.

croth1 commented 4 years ago

Hi,

we do not have such a script available yet. It would not be a lot of work to do that in python though. All you need to do is to create an LxL distance matrix from the structure file (see e.g. here code we use for extracting pairwise distances from pdb.

At this point all you have to do is

real_contacts = np.zeros(distance_matrix.shape)
real_contacts[distance_matrix < dist_threshold] = 1
595693085 commented 4 years ago

Thanks for the reply. I‘m still confused. Does the contact map prediction require PDB structures file? Can CCMpred predict contact map only by a protein sequence?That's exactly what I want.

croth1 commented 4 years ago

Does the contact map prediction require PDB structures file?

No, contact prediction requires only a large sequence alignment of evolutionary related sequences. The output is a LxL matrix of contact scores. The higher the score, the more likely a pair (i,j) to be in contact.

You need the pdb structure if you want to validate if the pairs with high score are indeed <8A. Maybe I misunderstood what you are trying to achieve. CCMpred does not do distance prediction. You will have look around to find deep neural networks that use CCMpred output as a feature and predict distances.

Can CCMpred predict contact map only by a protein sequence? You need a large sequence alignment as CCMpred uses only co-evolution information.

595693085 commented 4 years ago

Okay, I see. Thank you.