Open pprp opened 3 months ago
Thank you for your great work! I am trying to deploy llama-2-7b-chat model to Jetson Orin NX 8G.
I followed the instructions in Tinychat but found that when loading llama-2-7b-chat 4bit g128, it got killed due to out of memory issue.
Tinychat
Then, I followed the demo in NVIDIA Jetson website (https://www.jetson-ai-lab.com/tutorial_text-generation.html) and downloaded Llama-2-7B-Chat-GPTQ (https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ). It worked well but still got killed after several conversation.
As you mentioned in Readme, llama-2-7b-chat can work in Jetson Orin NX 8G. Is there anything that I missed? Any thought?
Thank you for your great work! I am trying to deploy llama-2-7b-chat model to Jetson Orin NX 8G.
I followed the instructions in
Tinychat
but found that when loading llama-2-7b-chat 4bit g128, it got killed due to out of memory issue.Then, I followed the demo in NVIDIA Jetson website (https://www.jetson-ai-lab.com/tutorial_text-generation.html) and downloaded Llama-2-7B-Chat-GPTQ (https://huggingface.co/TheBloke/Llama-2-7B-Chat-GPTQ). It worked well but still got killed after several conversation.
As you mentioned in Readme, llama-2-7b-chat can work in Jetson Orin NX 8G. Is there anything that I missed? Any thought?