jonathanking / sidechainnet

An all-atom protein structure dataset for machine learning.
BSD 3-Clause "New" or "Revised" License
322 stars 36 forks source link

Secondry Strucutre DSSP #31

Closed OsamaGhandour closed 3 years ago

OsamaGhandour commented 3 years ago

I see in DSSP Link that it use ( H B E G I T S ) and blank for loop or irregular

but when i print an secondry strucutre protein sample from data i found that you use letters ( H B E G L T S ) and blank so what L and blank are stands for in your data ??

jonathanking commented 3 years ago

I acquired the secondary structure information from ProteinNet https://github.com/aqlaboratory/proteinnet, so I cannot give you a definitive answer. I believe the L is written in place of the I (capital i), and both refer to the 5-helix class. The blank simply corresponds to a missing residue (which can occur for several preprocessing-related reasons), and therefore a missing secondary structure label.

OsamaGhandour commented 3 years ago

I thought that blank refer to the loops or irregular same as DSSP and forget about missing residues but i can work with it anyway thanks for clarifying

jonathanking commented 3 years ago

You're welcome. I haven't heard that about blanks before. The blanks were something I assigned in order to reconcile my data with AlQuraishi's and I can tell you that for Sidechainnet, they do refer to missing portions of data.

Good luck with your project!

OsamaGhandour commented 3 years ago

I found that about blanks in the DSSP URL that you used in Sidechiannet Summary.