Generative AI Examples is a collection of GenAI examples such as ChatQnA, Copilot, which illustrate the pipeline capabilities of the Open Platform for Enterprise AI (OPEA) project.
As AI PC or OPEA developer, I want to deploy OPEA in small memory of desktop (for example, 32G of desktop).
In my case, OPEA deploy failed in 8G, 16G as:
tgi: 2024-09-29T02:10:45.234182Z WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without --trust-remote-code because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety
tgi: 2024-09-29T02:10:45.234250Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors.
tgi: Error: DownloadError
tgi: 2024-09-29T02:10:58.798995Z ERROR download: text_generation_launcher: Download process was signaled to shutdown with signal 9:
tgi: 2024-09-29 02:08:04.162 | INFO | text_generation_server.utils.import_utils::75 - Detected system cpu
tgi: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
possibly solution:
1) use ollama/llama.cpp and provide example how to use ollama/llama.cpp
2) reduce memory footprint in tgi
Priority
P3-Medium
OS type
Ubuntu
Hardware type
AI-PC (Please let us know in description)
Running nodes
Single Node
Description
As AI PC or OPEA developer, I want to deploy OPEA in small memory of desktop (for example, 32G of desktop).
In my case, OPEA deploy failed in 8G, 16G as: tgi: 2024-09-29T02:10:45.234182Z WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without:75 - Detected system cpu
tgi: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
possibly solution:
1) use ollama/llama.cpp and provide example how to use ollama/llama.cpp
2) reduce memory footprint in tgi
--trust-remote-code
because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety tgi: 2024-09-29T02:10:45.234250Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors. tgi: Error: DownloadError tgi: 2024-09-29T02:10:58.798995Z ERROR download: text_generation_launcher: Download process was signaled to shutdown with signal 9: tgi: 2024-09-29 02:08:04.162 | INFO | text_generation_server.utils.import_utils: