Open MuhammadIshaq-AI opened 11 months ago
Hello,
Certainly! The pre-trained Llama-2-7B-chat model can be integrated using the streaming LLM method. To run the Llama-2-7B-chat model with streaming enabled, use the following command:
CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming --model_name_or_path meta-llama/Llama-2-7b-chat-hf
Guangxuan
Can i use the model for deployment purpose?
Yes, you can!
Thank you so much for your quick response to every question. One last thing I want to clarify is, that whenever I call the model, it is being downloaded and it takes so much time for the model to download its weight using the API access token or just calling it. My question is, if i can download the weights from huggingface and then use it locally in the vscode for the streaming llm, how can i do that, please do guide me about the paths etc.
Need your immediate assistance
It is probably because your system doesn't have a fixed cache folder. You can download the model to a folder such as path_to_model
, and use
CUDA_VISIBLE_DEVICES=0 python examples/run_streaming_llama.py --enable_streaming --model_name_or_path path_to_model
I am trying to implement lama2 model with a vector database to fetch queries, i want to use my model to interact with the vectordb using the streaming-llm, i am getting this error when i send a query to the vectordb.
\streaming-llm>python examples/run_streaming_llama.py --enable_streaming
Loading model from lama2weights ...
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 17.34it/s]
Enter your question (or type 'exit' to quit): hi
USER: hi
Traceback (most recent call last):
File "examples/run_streaming_llama.py", line 131, in
How can i integrate the lama2 7b model through this streaming llm, the model is already pretrained version, will it work over here?