mheinzinger / ProstT5

Bilingual Language Model for Protein Sequence and Structure
MIT License
147 stars 13 forks source link

Protein sequence length limit #12

Closed rakeshr10 closed 1 month ago

rakeshr10 commented 3 months ago

Hi @mheinzinger,

I was using ProstT5 on some protein sequences. I noticed it was not predicting 3Di sequences for protein sequence length of greater than 1000 residues. What is the reason and rationale for this?

Are these limits due to the way the model was trained, if so what are the other limitations?

Regards Rakesh

mheinzinger commented 3 months ago

Which script did you use exactly? If you use this one which only uses ProstT5-encoder + CNN to predict 3Di, you should get predictions for all proteins up to the point where your GPU runs out of memory (but this should not quit silently but throw an exception, so you should see why it fails).