The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.
Describe the bug
I tried to follow the instructions of qai_hub_models/models/llama_v2_7b_chat_quantized on github to test llama model inference, but there is an error of it.
To Reproduce
Steps to reproduce the behavior:
Activate the python environment
Run the command python -m qai_hub_models.models.llama_v2_7b_chat_quantized.demo
Expected behavior
OSError raised.
Stack trace
Traceback (most recent call last):
File "E:\ProgramFiles\qai_hub_env\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "E:\ProgramFiles\qai_hub_env\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "E:\ProgramFiles\qai_hub_env\lib\site-packages\qai_hub_models\models\llama_v2_7b_chat_quantized\demo.py", line 89, in <module>
llama_2_chat_demo()
File "E:\ProgramFiles\qai_hub_env\lib\site-packages\qai_hub_models\models\llama_v2_7b_chat_quantized\demo.py", line 75, in llama_2_chat_demo
tokenizer=get_tokenizer(),
File "E:\ProgramFiles\qai_hub_env\lib\site-packages\qai_hub_models\models\llama_v2_7b_chat_quantized\model.py", line 121, in get_tokenizer
tokenizer = LlamaTokenizer.from_pretrained(HF_REPO_NAME)
File "E:\ProgramFiles\qai_hub_env\lib\site-packages\transformers\tokenization_utils_base.py", line 2094, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'meta-llama/Llama-2-7b-chat-hf'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'meta-llama/Llama-2-7b-chat-hf' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.
Host configuration:
Windows
Additional context
I think this is related to Hugginface access token, is there any solution to pass my Hugginface key to the qai?
Describe the bug I tried to follow the instructions of
qai_hub_models/models/llama_v2_7b_chat_quantized
on github to test llama model inference, but there is an error of it.To Reproduce Steps to reproduce the behavior:
python -m qai_hub_models.models.llama_v2_7b_chat_quantized.demo
Expected behavior OSError raised.
Stack trace
Host configuration:
Additional context I think this is related to Hugginface access token, is there any solution to pass my Hugginface key to the qai?