[ChatQnA] TGI Service fail on a system with only 1 Gaudi card.

opea-project / GenAIExamples

Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.

https://opea.dev

Apache License 2.0

216 stars 132 forks source link

[ChatQnA] TGI Service fail on a system with only 1 Gaudi card. #370

Closed louie-tsai closed 2 weeks ago

louie-tsai commented 2 months ago

I used 1 card VM instance from IDC, and tgi-service didn't run successfully in that IDC VM.

While I tried to restart it with "docker compose -f docker_compose.yaml up tgi-service", I saw below issue. However, everything works fine if I used 8 cards IDC instance.

Suggest to put some notes at least to notify users for this limitation as below PR https://github.com/opea-project/GenAIExamples/pull/293

eero-t commented 1 month ago

I haven't tried using Gaudis (nor Docker-compose), but thought of few possible issues...

Based on your error output, sharding is enabled. TGI tries by default to use all available devices, but if sharding is enabled, TGI expects amount of devices to correspond to number of shards: https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#numshard

Could you try removing TGI sharding options?

Also, could you check what Gaudi device file name(s) are present inside the VM? If that single Gaudi device index is not 0 (i.e. VM uses host's device file name for it), it's possible that driver does not find it (if it starts scanning from index 0).

louie-tsai commented 1 month ago

@eero-t

We saw Gaudi device fie with index 0 as below snapshot.

Not sure about removing TGI sharding, but we could assign visible devices in docker compose for both TEI and TGI service TEI TGI

Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.

eero-t commented 1 month ago

Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.

Ah, yes, sharing the device between multiple processes concurrently is not support by Gaudi drivers.

TGI is the heaviest of the ChatQnA services, so it makes sense to run it on fastest accelerator => you need to dedicate another device for TEI, or use CPU for it (for which you already filed #368).

(I could imagine also setups where TGI is on Gaudi, TEI services share GPU, and rest run on CPU, but I think there's some work still to do for that kind of mixing.)

louie-tsai commented 2 weeks ago

@eero-t Then, should we put some notes at least to notify users for this limitation as below?

eero-t commented 2 weeks ago

@louie-tsai Please don't assign things to me as I'm not a developer in this project (just another user testing it).

yinghu5 commented 2 weeks ago

fix by https://github.com/opea-project/GenAIExamples/pull/293 thank you