mheinzinger / SeqVec

Modelling the Language of Life - Deep Learning Protein Sequences
MIT License
70 stars 27 forks source link

Checkpoint file for the SeqVec model #5

Closed ashishjain1988 closed 4 years ago

ashishjain1988 commented 4 years ago

Can you please point me where I can find the final checkpoint files for the SeqVec protein embedding model.

sacdallago commented 4 years ago

https://github.com/mheinzinger/SeqVec/issues/3#issuecomment-512879661

ashishjain1988 commented 4 years ago

@sacdallago Thanks for the comment, but that issue consists of the checkpoints files for the classification models (right?). I have some new protein sequences and want to retrain the SeqVec embedder. Can you please point to which specific checkpoint files should I use for that?

sacdallago commented 4 years ago

To retrain the embedder, you have to train an Elmo model. There are no checkpoints for retraining, just parameters. For the set of parameters used @mheinzinger can help you out.

The point of the embedder is that you don't have to retrain it whenever you have a bunch of new sequences! This will be very expensive, and for a few new sequences w.r.t. the large databases (e.g. uniref90) it won't change the internal state of the model by much.

If you want to embed the new sequences, you have to use the checkpoint of the elmo embedder as described in the readme.

mheinzinger commented 4 years ago

Just catching up here; sorry for the long delay. Checkpoint files are now available here: rostlab.org/~deepppi/seqvec_checkpoint.tar.gz Would be interesting to hear your experience with fine-tuning the model if possible :)

ashishjain1988 commented 4 years ago

Thank you for the files!