sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

[Feature] EmbedderInterface class with device_index attribute and set_device_index method #110

Closed ptynecki closed 3 years ago

ptynecki commented 3 years ago

Hey,

Description

I would suggest to extend EmbedderInterface class (inherited by each of the embedding class) to support device index setup (torch.device). This possibility unleash the potential of bio-embeddings execution on multi-GPU server at the same time.

By set_device_index method or by device_index argument in constructor, the user could decide and control which embedding instance is computed by which GPU.

Motivation

I would execute SeqVec and ProtBertBFD (or any other) embedding separately but on the same time on two different GPU. I cannot do it now because device_index 0 is by default and I need a code injection to change the index. Much better will be if I could control that on embedding instance level.

If you are open for pull requests, I will propose the implementation.

Regards, Piotr

konstin commented 3 years ago

Hi Piotr,

you can already do this by passing device="cuda:" + str(device_index). The following code (which can be run from the repository root) embeds on two gpus at the same time:

from concurrent.futures.thread import ThreadPoolExecutor
from typing import List

from bio_embeddings.embed import (
    SeqVecEmbedder,
    ProtTransBertBFDEmbedder,
    EmbedderInterface,
)
from bio_embeddings.utilities import read_fasta

seqvec = SeqVecEmbedder(device="cuda:0")
bert = ProtTransBertBFDEmbedder(device="cuda:1")

sequences = [str(i.seq[:]) for i in read_fasta("examples/deeploc/deeploc_data.fasta")]

def embed_sequences(embedder: EmbedderInterface, sequences: List[str]):
    for i in embedder.embed_many(sequences):
        print(f"{embedder.name}: {i.mean()}")

with ThreadPoolExecutor(2) as executor:
    results = executor.map(embed_sequences, [seqvec, bert], [sequences, sequences])
sacdallago commented 3 years ago

Closing due to inactivity. Please re-open as needed :)