songlab-cal / tape

Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology.
https://www.biorxiv.org/content/10.1101/676825v1
BSD 3-Clause "New" or "Revised" License
662 stars 129 forks source link

secondary structure dictionary #61

Open jcdelpiano opened 4 years ago

jcdelpiano commented 4 years ago

Hello,

I'm training a model for secondary structure prediction using three classes, but im not sure about how you encode those classes in the data (ss3). Could you please help me with this?

Many thanks!

Javier.

rmrao commented 4 years ago

@nickbhat I believe you did the encoding of SS3 data - can you answer this?

thomas-a-neil commented 4 years ago

Hi Javier,

You can find the mapping from secondary structure index to class in the deprecated repo here https://github.com/songlab-cal/tape-neurips2019/blob/master/tape/data_utils/vocabs.py#L147. Is this what you were looking for?

Perhaps we should merge this vocabs file in from the deprecated tensorflow repo

jcdelpiano commented 4 years ago

Hi,

yes, that's exactly what I was looking for! Thank you for the help!

I think it would be useful to merge the vocabs as you suggest, for future users :)

Thanks again!