mheinzinger / SeqVec

Modelling the Language of Life - Deep Learning Protein Sequences
MIT License
71 stars 26 forks source link

Is it possible to embed batched sequences? #2

Closed ufimtsev closed 5 years ago

mheinzinger commented 5 years ago

Yes, it is possible and actually it is crucial to make max. usage of your VRAM and to speed up embeddings. All you need to do is use elmo.predict_on_batches( List[ List[str] ]). It takes a list of lists as input, with the inner list being a list of words/amino acids. You can check the official implementation here: https://github.com/allenai/allennlp/blob/master/allennlp/commands/elmo.py

mheinzinger commented 5 years ago

If you sort your sequences according to their length before embedding them, you can increase speed further.