mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library
https://mit-han-lab.github.io/TinyChatEngine/
MIT License
624 stars 58 forks source link

Jetson Nano Orin 8GB running out of memory on LLaMA2_7B_chat_awq_int4 #96

Closed Dudu014 closed 4 months ago

Dudu014 commented 4 months ago

I am currently using Jetson Nano Orin Developer Kit 8GB to run the LLaMA2_7B_chat_awq_int4.

Everything has built fine. However when running ./chat the process gets killed. By monitoring resources I can tell is due to running out of memory.

To improve performance I disable ZWAP and allocated SWAP memory onto the NVMe.

The performance improved but not enough to run the model.

Based on readme file, Jetson Nano Orin should be able to run the lightest model as per the news on 2024/01.

Is there anything I should consider to be able to run it? If not possible, is there any guidelines or way to load a lighter model into TinyChatEngine?

Thanks in advance!

Dudu014 commented 4 months ago

I was able to run it after rebooting the Jetson Nano Orin, so problem solved.