Open fordendk opened 2 weeks ago
Looks like this could be the problem as there is no hugging face cache at this location and the token is specific in the .env: 2024-09-26T00:47:24.391621Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
Hi, would you check whether HUGGINGFACEHUB_API_TOKEN
has been successfully set?
echo $HUGGINGFACEHUB_API_TOKEN
yes I checked the env and the env echo returns the right token... I even made a file in the .cache at /root/.cache/... with the HF token in it, but it sti;; gets this INFO warning and the same error: 2024-09-30T12:19:50.726696Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token"
On Mon, Sep 30, 2024 at 2:30 PM Zaili Wang @.***> wrote:
Hi, would you check whether HUGGINGFACEHUB_API_TOKEN has been successfully set?
echo $HUGGINGFACEHUB_API_TOKEN
— Reply to this email directly, view it on GitHub https://github.com/opea-project/GenAIExamples/issues/884#issuecomment-2382324286, or unsubscribe https://github.com/notifications/unsubscribe-auth/BLUVF3EC26VXCAZATFPWWZTZZD4ZDAVCNFSM6AAAAABO6VT5HWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBSGMZDIMRYGY . You are receiving this because you authored the thread.Message ID: @.***>
Priority
Undecided
OS type
Ubuntu
Hardware type
GPU-Nvidia
Installation method
Deploy method
Running nodes
Single Node
What's the version?
I used the instructions on this web page to install: https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7
Docker version locally is Docker version 27.3.1, build ce12230. Host OS is Windows 11, using WSL2 Ubuntu (Linux LAPTOP-60F4I00F 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux). GPU is nVidia GEFORCE RTX 4070.
Description
Seems like some sort of problem with Hugging Face TGI download:
2024-09-26T00:47:24.391621Z INFO hf_hub: Token file not found "/root/.cache/huggingface/token" 2024-09-26T00:47:24.393715Z INFO text_generation_launcher: Model supports up to 32768 but tgi will now set its default to 4096 instead. This is to save VRAM by refusing large prompts in order to allow more users on the same hardware. You can increase that size using:75 - Detected system ipex
/opt/conda/lib/python3.10/site-packages/text_generation_server/utils/sgmv.py:18: UserWarning: Could not import SGMV kernel from Punica, falling back to loop.
warnings.warn("Could not import SGMV kernel from Punica, falling back to loop.")
--max-batch-prefill-tokens=32818 --max-total-tokens=32768 --max-input-tokens=32767
. 2024-09-26T00:47:24.393727Z INFO text_generation_launcher: Defaultmax_input_tokens
to 4095 2024-09-26T00:47:24.393730Z INFO text_generation_launcher: Defaultmax_total_tokens
to 4096 2024-09-26T00:47:24.393730Z INFO text_generation_launcher: Defaultmax_batch_prefill_tokens
to 4145 2024-09-26T00:47:24.393851Z INFO download: text_generation_launcher: Starting check and download process for Intel/neural-chat-7b-v3-3 2024-09-26T00:47:34.970340Z WARN text_generation_launcher: 🚨🚨BREAKING CHANGE in 2.0🚨🚨: Safetensors conversion is disabled without--trust-remote-code
because Pickle files are unsafe and can essentially contain remote code execution!Please check for more information here: https://huggingface.co/docs/text-generation-inference/basic_tutorials/safety 2024-09-26T00:47:34.970364Z WARN text_generation_launcher: No safetensors weights found for model Intel/neural-chat-7b-v3-3 at revision None. Converting PyTorch weights to safetensors. Error: DownloadError 2024-09-26T00:49:39.199001Z ERROR download: text_generation_launcher: Download process was signaled to shutdown with signal 9: 2024-09-26 00:47:29.556 | INFO | text_generation_server.utils.import_utils:Reproduce steps
Follow guide from this URL (I definitely included my Hugging Face key in the .env etc.): https://gist.github.com/arun-gupta/7e9f080feff664fbab878b26d13d83d7
Raw log
No response