Request to upgrade TGI image to 2.0

opea-project / GenAIComps

GenAI components at micro-service level; GenAI service composer to create mega-service

Apache License 2.0

50 stars 110 forks source link

Request to upgrade TGI image to 2.0 #230

Closed dhandhalyabhavik closed 2 months ago

dhandhalyabhavik commented 3 months ago

Please update TGI image to 2.0 from 1.4 in all TGI readme files.

I faced issues with Phi-3 model.

feng-intel commented 3 months ago

Assign to lvliang to update.

eero-t commented 3 months ago

Updating to TGI 2.0 also improves model security, but may require model conversion: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety

eero-t commented 2 months ago

Another reason to update to TGI v2, would be Intel XPU (GPU) support added to TGI v2.0.2.

yinghu5 commented 2 months ago

Thank you all, i close the issue as https://github.com/opea-project/GenAIComps/pull/273 merged. Let's continue the discussion on https://github.com/opea-project/GenAIExamples/issues/371. thanks

eero-t commented 2 months ago

Did quick testing of just updating TGI image tag in (OPEA v0.7) tgi_service.yaml:

TGI v2.0.4 fails on Xeon due to missing CUDA
TGI v2.1.0 & v2.1.1 work, but do not provide Prometheus metrics any more

Fix for metrics missed (latest) v2.1.1 release by few days, so TGI is still returning empty metrics: https://github.com/huggingface/text-generation-inference/issues/2184

Which makes 2.x versions for now no-go for https://github.com/opea-project/GenAIComps/issues/260.

eero-t commented 2 months ago

TGI v2.2.0 was just released: https://github.com/huggingface/text-generation-inference/releases

It does fix the empty Prometheus metrics issue in (#2184) in earlier TGI 2.x releases.

eero-t commented 2 months ago

TGI 2.2 does seem to start slower (do warmup longer?).

After longer stressing sessions, it has started not to respond to its (k8s readiness) "/health" probes within 1s, but I'm not sure whether that's 2.x specific problem, or was that there already with 1.4 (did not try readiness probes with the older version).

ritesh-intel commented 2 months ago

I'm not sure if this is related. but for using XPU support of intel from ghcr.io/huggingface/text-generation-inference:latest-intel I used this particular docker-tag for the text-generation-inference, and it worked fine, as with latest version it was not able to detect the XPU. This setup worked faster for me with llm model intel/neural-7b Environment variables setup was required mainly for ipex.xpu.is_available() to be true

source /opt/intel/oneapi/setvars.sh
source /opt/intel/oneapi/compiler/latest/env/vars.sh
source /opt/intel/oneapi/mkl/latest/env/vars.sh