Closed datduong closed 5 years ago
There is not a single function to do what you ask. You need to first encode the amino acid sequence into bytes with alphabets.Uniprot21, convert this to a pytorch tensor, and then embed the sequence with the trained model.
I suggest taking a look at eval_secstr and/or eval_transmembrane for an idea of how this works (see specifically "encodesequence" and "TorchModel.__call_\" in eval_secstr).
With regards to models, "pfam_lm_lstm2x1024_tied_mb64.sav" is the bidirectional language model trained on Pfam. Of the structure-based embedding models, I would suggest using "ssa_L1_100d_lstm3x512_lm_i512_mb64_tau0.5_lambda0.1_p0.05_epoch100.sav" which is the SSA model trained with both structural similarity and contact prediction tasks on the full training set (SSA (full) model in the manuscript).
Thanks for your help. I was able to figure out how to use the Uniprot21 conversion.
Would you be able to provide instruction for running the encoder? For example, is there some function like this
model.encode('MKVKK')
where MKVKK is some amino acid sequence?Which of the pre-trained models should I use?
Thanks.