Can I make some simple predictions directly (just like stability on my own datasets) using the sequence embedding results?

goes0n commented 4 years ago

I want to embed the protein sequences (my own dataset) and use the embedding vectors to make stability predictions. Can I use the Extracting Embeddings section directly to get the results? I just started to learn the knowledge, and I would appreciate it if you can reply to me.

rmrao commented 4 years ago

There are two potential workflows:

You can extract and save embeddings, then use these as input to a model that you write yourself on your dataset. If you would like to do this I'd recommend using the babbler-1900 model, since this will produce good single-vector embeddings of a protein. Extracting embeddings returns an npz file, so you could write a downstream model in any way you want, even in something like scikit-learn.
You can fully finetune an existing model, either using our training code, or by writing your own. This will probably give the best results, but does require that you write some pytorch code.

Hope this helps!

rmrao commented 4 years ago

Since there's no followup, I'm going to assume this is closed.

songlab-cal / tape

Can I make some simple predictions directly (just like stability on my own datasets) using the sequence embedding results? #44