tbepler / protein-sequence-embedding-iclr2019

Source code for "Learning protein sequence embeddings using information from structure" - ICLR 2019
Other
253 stars 75 forks source link

Using pre-trained models #25

Closed mstrazar closed 3 years ago

mstrazar commented 3 years ago

Hi @tbepler , I managed to install your code and would like to use pre-trained models to embed 100,000s protein sequences. I tried ssa_L1_100d_lstm3x512_lm_i512_mb64_tau0.5_p0.05_epoch100.sav , but it appears to process only ~ 1 sequence per minute on top end CPU (no cuda). If i understand correctly, sequences should just be passed through the network to get the embeddings, i.e. a fast operation.

Any ideas? Should I use other pre-trained models?

Thanks, Martin

tbepler commented 3 years ago

Hi Martin,

Have you checked that pytorch is using all of your CPU cores? Is the number of cores limited for some reason?

I wouldn't expect the code to run super fast on CPU, but that does sound slow. If you have a GPU, I highly recommend using it! The code will run much faster.

For the embedding model, I suggest using the lambda0.1 model, but it won't change the runtime.

mstrazar commented 3 years ago

Just to confirm, a GPU machine and lambda0.1 model crank out 300 sequences per minute. Thanks, Martin

tbepler commented 3 years ago

Great, closing this issue.