Closed ptynecki closed 3 years ago
Hi Piotr,
you can already do this by passing device="cuda:" + str(device_index)
. The following code (which can be run from the repository root) embeds on two gpus at the same time:
from concurrent.futures.thread import ThreadPoolExecutor
from typing import List
from bio_embeddings.embed import (
SeqVecEmbedder,
ProtTransBertBFDEmbedder,
EmbedderInterface,
)
from bio_embeddings.utilities import read_fasta
seqvec = SeqVecEmbedder(device="cuda:0")
bert = ProtTransBertBFDEmbedder(device="cuda:1")
sequences = [str(i.seq[:]) for i in read_fasta("examples/deeploc/deeploc_data.fasta")]
def embed_sequences(embedder: EmbedderInterface, sequences: List[str]):
for i in embedder.embed_many(sequences):
print(f"{embedder.name}: {i.mean()}")
with ThreadPoolExecutor(2) as executor:
results = executor.map(embed_sequences, [seqvec, bert], [sequences, sequences])
Closing due to inactivity. Please re-open as needed :)
Hey,
Description
I would suggest to extend
EmbedderInterface
class (inherited by each of the embedding class) to support device index setup (torch.device). This possibility unleash the potential of bio-embeddings execution on multi-GPU server at the same time.By
set_device_index
method or bydevice_index
argument in constructor, the user could decide and control which embedding instance is computed by which GPU.Motivation
I would execute SeqVec and ProtBertBFD (or any other) embedding separately but on the same time on two different GPU. I cannot do it now because device_index 0 is by default and I need a code injection to change the index. Much better will be if I could control that on embedding instance level.
If you are open for pull requests, I will propose the implementation.
Regards, Piotr