Closed dhandhalyabhavik closed 2 months ago
Assign to lvliang to update.
Updating to TGI 2.0 also improves model security, but may require model conversion: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
Another reason to update to TGI v2, would be Intel XPU (GPU) support added to TGI v2.0.2.
Thank you all, i close the issue as https://github.com/opea-project/GenAIComps/pull/273 merged. Let's continue the discussion on https://github.com/opea-project/GenAIExamples/issues/371. thanks
Did quick testing of just updating TGI image tag in (OPEA v0.7) tgi_service.yaml
:
Fix for metrics missed (latest) v2.1.1 release by few days, so TGI is still returning empty metrics: https://github.com/huggingface/text-generation-inference/issues/2184
Which makes 2.x versions for now no-go for https://github.com/opea-project/GenAIComps/issues/260.
TGI v2.2.0 was just released: https://github.com/huggingface/text-generation-inference/releases
It does fix the empty Prometheus metrics issue in (#2184) in earlier TGI 2.x releases.
TGI 2.2 does seem to start slower (do warmup longer?).
After longer stressing sessions, it has started not to respond to its (k8s readiness) "/health" probes within 1s, but I'm not sure whether that's 2.x specific problem, or was that there already with 1.4 (did not try readiness probes with the older version).
I'm not sure if this is related.
but for using XPU support of intel from ghcr.io/huggingface/text-generation-inference:latest-intel
I used this particular docker-tag for the text-generation-inference, and it worked fine, as with latest version it was not able to detect the XPU.
This setup worked faster for me with llm model intel/neural-7b
Environment variables setup was required mainly for ipex.xpu.is_available() to be true
source /opt/intel/oneapi/setvars.sh
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/mkl/latest/env/vars.sh
Please update TGI image to 2.0 from 1.4 in all TGI readme files.
I faced issues with Phi-3 model.