opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
https://opea.dev
Apache License 2.0
219 stars 134 forks source link

[Bug ]ChatQnA - compose.yaml for Gaudi - Habana devices #739

Open pallavijaini0525 opened 1 week ago

pallavijaini0525 commented 1 week ago

Priority

Undecided

OS type

Ubuntu

Hardware type

Gaudi2

Installation method

Deploy method

Running nodes

Single Node

What's the version?

https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

Description

For the ChatQnA application, https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml

compose.yaml has two containers where both are requesting HABANA_VISIBLE_DEVICES=all, For multi tenancy we need to specify the device ids instead of all,

with the existing compose.yaml, error is as below.

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: exposing interfaces: failed creating temporary link on host: invalid argument

Reproduce steps

Run the docker compose file - https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/docker/gaudi/compose.yaml after setting the env variables specified in https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gaudi#setup-environment-variables

Raw log

No response

feng-intel commented 1 week ago

Gaudi docs page: https://docs.habana.ai/en/latest/Orchestration/Multiple_Tenants_on_HPU/Multiple_Dockers_each_with_Single_Workload.html

You can set HABANA_VISIBLE_DEVICES=0,1,2,3 , to specify the device ids instead of all.

pallavijaini0525 commented 1 week ago

yes, I have made the change and able to execute, but added here to create a placeholder or make a note in the Readme file so the user will not miss updating the devices.