microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system
https://microsoft.github.io/graphrag/
MIT License
19.24k stars 1.9k forks source link

[Issue]: <SIGKILL on Startup During Docker Build with graphrag Imports> #1232

Closed c23996 closed 1 month ago

c23996 commented 1 month ago

Do you need to file an issue?

Describe the issue

I'm encountering a SIGKILL error when starting my app after building it with Docker (targeting linux/amd64). The app works locally without issues, but during Docker-based startup, some specific imports from the graphrag package cause a SIGKILL, preventing the app from starting.

Problem Imports:

from graphrag.query.context_builder.entity_extraction import EntityVectorStoreKey
from graphrag.query.indexer_adapters import (
    read_indexer_entities,
    read_indexer_relationships,
    read_indexer_reports,
    read_indexer_text_units,
)
from graphrag.query.input.loaders.dfs import (
   store_entity_semantic_embeddings,
 )
from graphrag.query.structured_search.local_search.mixed_context import (
     LocalSearchMixedContext,
 )
from graphrag.vector_stores.lancedb import LanceDBVectorStore

Steps to reproduce

  1. Build the Docker image targeting linux/amd64.
  2. Start the app using the provided Docker entrypoint.
  3. Observe the SIGKILL error during startup.

Expected Behavior: The app should start without crashing or encountering a SIGKILL.

Environment: Local environment: M1 Mac with Apple silicon, Works with Uvicorn without issues. Docker environment: Fails on startup with SIGKILL. Docker Target: linux/amd64 entrypoint:

  [
"gunicorn"
,
"--worker-class"
,
"uvicorn.workers.UvicornWorker"
,
"--bind=0.0.0.0:8000"
,
"--timeout=300"
,
"--workers=1"
,
"--log-level=DEBUG"
,
"src.app.main:get_api"
,
  ]

Dependencies

[tool.poetry.dependencies]
python = ">=3.11,<3.13"
fastapi = "^0.115.0"
structlog = "^24.4.0"
ruff = "^0.6.8"
black = "^24.8.0"
pre-commit = "^3.8.0"
gunicorn = "^23.0.0"
newrelic = "^10.0.0"
gcsfs = "^2024.9.0.post1"
uvicorn = "^0.31.0"
pytest = "^8.3.3"
pytest-cov = "^5.0.0"
google-cloud = "^0.34.0"
google-cloud-storage = "^2.18.2"
pydantic = "^2.9.2"
pydantic-settings = "^2.5.2"
graphrag = "^0.3.6"

GraphRAG Config Used

# Paste your config here

Logs and screenshots

Run in docker:

api-1  | [2024-09-30 14:38:55 +0000] [1] [INFO] Starting gunicorn 23.0.0
api-1  | [2024-09-30 14:38:55 +0000] [1] [DEBUG] Arbiter booted
api-1  | [2024-09-30 14:38:55 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
api-1  | [2024-09-30 14:38:55 +0000] [1] [INFO] Using worker: sync
api-1  | [2024-09-30 14:38:55 +0000] [7] [INFO] Booting worker with pid: 7
api-1  | [2024-09-30 14:38:55 +0000] [1] [DEBUG] 1 workers
api-1  | [2024-09-30 14:38:58 +0000] [1] [ERROR] Worker (pid:7) was sent SIGILL!
api-1  | [2024-09-30 14:38:58 +0000] [17] [INFO] Booting worker with pid: 17
api-1  | [2024-09-30 14:39:01 +0000] [1] [ERROR] Worker (pid:17) was sent SIGILL!
api-1  | [2024-09-30 14:39:01 +0000] [27] [INFO] Booting worker with pid: 27
api-1  | [2024-09-30 14:39:03 +0000] [1] [ERROR] Worker (pid:27) was sent SIGILL!
api-1  | [2024-09-30 14:39:03 +0000] [37] [INFO] Booting worker with pid: 37

Run in gunicorn locally:

objc[99375]: +[NSString initialize] may have been in progress in another thread when fork() was called.
objc[99375]: +[NSString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
[2024-09-30 16:50:32 +0200] [98453] [ERROR] Worker (pid:99375) was sent SIGABRT!

Additional Information

natoverse commented 1 month ago

Routing to #657

c23996 commented 1 month ago

Hey, I mistakenly didn't choose the second option; we were actually using the OpenAI API with GPT-4o. Could we please reopen this ticket? Thank you!

Routing to #657