neuml / txtai

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
https://neuml.github.io/txtai
Apache License 2.0
8.71k stars 576 forks source link

Cuda error on initialzing Embedding instance in a spawned subprocess aka a celery background task. #697

Closed obonyojimmy closed 4 months ago

obonyojimmy commented 4 months ago

Hi , thanks for this awesome lib. Am facing an issue when initializing an Embending instance in a celery background task , below is error trace .

Heres a sniplet of the Celery code:

from txtai.embeddings import Embeddings

@celery_app.task
def ftp_indexer():
    vetorizer_model: str = "BAAI/bge-base-en-v1.5"
    _config = {
    "path": vectorizer, 
    "autoid": "uuid5",
    "graph": {
        "approximate": False, 
        "topics": {}
    },
    "content": 'sqlite',
    "sqlite": { "wal": True},
   }
   embeddings = Embeddings(_config)

Error Trace:

File "/app/taskflow/integrations/integration.py", line 175, in __init__
taskflow-1  |     embeddings = Embeddings(default_config)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/embeddings/base.py", line 88, in __init__
taskflow-1  |     self.configure(config)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/embeddings/base.py", line 737, in configure
taskflow-1  |     self.model = self.loadvectors() if self.config else None
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/embeddings/base.py", line 888, in loadvectors
taskflow-1  |     model = VectorsFactory.create(self.config, self.scoring)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/vectors/factory.py", line 44, in create
taskflow-1  |     return TransformersVectors(config, scoring) if config and "path" in config else None
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/vectors/base.py", line 39, in __init__
taskflow-1  |     self.model = self.load(config.get("path"))
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/vectors/transformers.py", line 33, in load
taskflow-1  |     return PoolingFactory.create({"path": path, "device": deviceid, "tokenizer": self.config.get("tokenizer"), "method": method})
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/models/pooling/factory.py", line 46, in create
taskflow-1  |     return ClsPooling(path, device, tokenizer)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/txtai/models/pooling/base.py", line 42, in __init__
taskflow-1  |     self.to(self.device)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
taskflow-1  |     return self._apply(convert)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
taskflow-1  |     module._apply(fn)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
taskflow-1  |     module._apply(fn)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
taskflow-1  |     module._apply(fn)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
taskflow-1  |     param_applied = fn(param)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
taskflow-1  |     return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
taskflow-1  |   File "/root/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 288, in _lazy_init
taskflow-1  |     raise RuntimeError(
taskflow-1  | RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Thanks once more.

davidmezzetti commented 4 months ago

Hello, thanks for trying txtai and the report.

This looks to be a general issue with Torch and Celery: https://stackoverflow.com/questions/70541625/unable-to-use-pytorch-with-cuda-in-celery-task

Does a solution like what's in that SO post seem acceptable?

obonyojimmy commented 4 months ago

Thanks for the suggestion , yes that did solve the issue .