opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
https://opea.dev
Apache License 2.0
221 stars 138 forks source link

[Bug] TEI Gaudi 2 image is failing to launch #815

Open ezelanza opened 3 days ago

ezelanza commented 3 days ago

Priority

Undecided

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

Deploy method

Running nodes

Single Node

What's the version?

latest

Description

TEI Gaudi image is not lauching due to errors.

Reproduce steps

After running the docker compose the image "opea/tei-gaudi:latest " started but it fails after launching

Raw log

/ChatQnA/docker_compose/intel/hpu/gaudi$ docker logs 349bc3685e97
2024-09-15T19:24:20.318465Z  INFO text_embeddings_router: router/src/main.rs:175: Args { model_id: "BAA*/***-****-**-v1.5", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "349bc3685e97", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }
2024-09-15T19:24:20.318939Z  INFO hf_hub: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hf-hub-0.3.2/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"    
2024-09-15T19:24:20.445084Z  INFO download_pool_config: text_embeddings_core::download: core/src/download.rs:45: Downloading `1_Pooling/config.json`
2024-09-15T19:24:21.565035Z  INFO download_new_st_config: text_embeddings_core::download: core/src/download.rs:108: Downloading `config_sentence_transformers.json`
2024-09-15T19:24:21.693734Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-09-15T19:24:21.693774Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:22: Downloading `config.json`
2024-09-15T19:24:21.823411Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:25: Downloading `tokenizer.json`
2024-09-15T19:24:22.137401Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:52: Downloading `model.safetensors`
2024-09-15T19:24:43.732689Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:39: Model artifacts downloaded in 22.038954482s
2024-09-15T19:24:44.030320Z  INFO text_embeddings_router: router/src/lib.rs:199: Maximum number of tokens per request: 512
2024-09-15T19:24:44.049828Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:26: Starting 152 tokenization workers
2024-09-15T19:24:44.374782Z  INFO text_embeddings_router: router/src/lib.rs:250: Starting model backend
2024-09-15T19:24:44.375230Z  INFO text_embeddings_backend_python::management: backends/python/src/management.rs:58: Starting Python backend
2024-09-15T19:24:48.899061Z  WARN python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:39: Could not import Flash Attention enabled models: No module named 'dropout_layer_norm'

2024-09-15T19:24:50.018977Z ERROR python-backend: text_embeddings_backend_python::logging: backends/python/src/logging.rs:40: Error when initializing model
Traceback (most recent call last):
  File "/usr/local/bin/python-text-embeddings-server", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 716, in main
    return _main(
  File "/usr/local/lib/python3.10/dist-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/src/backends/python/server/text_embeddings_server/cli.py", line 51, in serve
    server.serve(model_path, dtype, uds_path)
  File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 88, in serve
    asyncio.run(serve_inner(model_path, dtype))
  File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
    self.run_forever()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
    self._run_once()
  File "/usr/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
    handle._run()
  File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/backends/python/server/text_embeddings_server/server.py", line 57, in serve_inner
    model = get_model(model_path, dtype)
  File "/usr/src/backends/python/server/text_embeddings_server/models/__init__.py", line 56, in get_model
    raise ValueError("CPU device only supports float32 dtype")
ValueError: CPU device only supports float32 dtype

Error: Could not create backend

Caused by:
Could not start backend: Python backend failed to start
lvliang-intel commented 2 days ago

@ezelanza, Please share your command of launching the service. I just verified the image opea/tei-gaudi:latest, it works well on my Gaudi2 server.

docker run -p 9780:80 -v $volume:/data -e http_proxy=$http_proxy -e https_proxy=$https_proxy --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none -e MAX_WARMUP_SEQUENCE_LENGTH=512 --cap-add=sys_nice --ipc=host opea/tei-gaudi:latest --model-id $model --pooling cls

image image

ezelanza commented 2 days ago

It only works on my Gaudi 2 environment separately, but I'm still getting that error when I'm running the docker compose file from cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ docker compose up -d