tbepler / protein-sequence-embedding-iclr2019

Source code for "Learning protein sequence embeddings using information from structure" - ICLR 2019
Other
253 stars 75 forks source link

The embedding sequence shape #24

Closed SaidaSaad closed 3 years ago

SaidaSaad commented 3 years ago

Hello

Thank you for the code. I did able to get the embedding using pretrained mode. I would like to ask what is the size of the embedding for a given sequence . For example i have sequences of length 17 . what the shape or the length of the embedding that i can get. And also i would like to ask what is the difference of the embedding output if i put full_features=False or True? Which one has more information?

Thanks

tbepler commented 3 years ago

You will get one vector embedding per position unless you set the --pool flag in which case the embeddings will be pooled over the sequence depending on the option selected.

full_features=True has more info and is higher dimension, because it includes both the LM and structure-based hidden layers. If you set full_features=False, you will only get the final 100-d projection layer output.