Closed louie-tsai closed 2 weeks ago
I haven't tried using Gaudis (nor Docker-compose), but thought of few possible issues...
Based on your error output, sharding is enabled. TGI tries by default to use all available devices, but if sharding is enabled, TGI expects amount of devices to correspond to number of shards: https://huggingface.co/docs/text-generation-inference/en/basic_tutorials/launcher#numshard
Could you try removing TGI sharding options?
Also, could you check what Gaudi device file name(s) are present inside the VM? If that single Gaudi device index is not 0 (i.e. VM uses host's device file name for it), it's possible that driver does not find it (if it starts scanning from index 0).
@eero-t
We saw Gaudi device fie with index 0 as below snapshot.
Not sure about removing TGI sharding, but we could assign visible devices in docker compose for both TEI and TGI service TEI TGI
Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.
Problem here is that TEI and TGI seems to try to compete with each other for the only 1 Gaudi card, and TGI failed with the error message.
Ah, yes, sharing the device between multiple processes concurrently is not support by Gaudi drivers.
TGI is the heaviest of the ChatQnA services, so it makes sense to run it on fastest accelerator => you need to dedicate another device for TEI, or use CPU for it (for which you already filed #368).
(I could imagine also setups where TGI is on Gaudi, TEI services share GPU, and rest run on CPU, but I think there's some work still to do for that kind of mixing.)
@eero-t Then, should we put some notes at least to notify users for this limitation as below?
@louie-tsai Please don't assign things to me as I'm not a developer in this project (just another user testing it).
fix by https://github.com/opea-project/GenAIExamples/pull/293 thank you
I used 1 card VM instance from IDC, and tgi-service didn't run successfully in that IDC VM.
While I tried to restart it with "docker compose -f docker_compose.yaml up tgi-service", I saw below issue. However, everything works fine if I used 8 cards IDC instance.
Suggest to put some notes at least to notify users for this limitation as below PR https://github.com/opea-project/GenAIExamples/pull/293