sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
460 stars 65 forks source link

CUDA out of memory. #217

Open wenyuhaokikika opened 1 year ago

wenyuhaokikika commented 1 year ago

I had this problem when running the embed of bio_embedding,

ERROR:bio_embeddings.embed.embedder_interfaces:Error processing batch of 3 sequences: CUDA out of memory. Tried to allocate 972.00 MiB (GPU 1; 7.80 GiB total capacity; 4.91 GiB already allocated; 717.31 MiB free; 4.92 GiB reserved in total by PyTorch). You might want to consider adjusting the `batch_size` parameter. Will try to embed each sequence in the set individually on the GPU.

image

Although the final result is calculated, I am not sure if it calculated it correctly. Is there any option that can be set to avoid this, e.g. reduce batch_size size, use multiple GPU operations. I did not find the relevant options in `examples/parameters_blueprint.yml

zff1116 commented 1 year ago

I seem to have the same problem...

fedorn commented 1 year ago

EmbedderInterface.embed_many has a batch_size argument, but it doesn't use batching by default and processes each sentence individually.