Jetson Nano Orin 8GB running out of memory on LLaMA2_7B_chat_awq_int4

I am currently using Jetson Nano Orin Developer Kit 8GB to run the LLaMA2_7B_chat_awq_int4.

Everything has built fine. However when running ./chat the process gets killed. By monitoring resources I can tell is due to running out of memory.

To improve performance I disable ZWAP and allocated SWAP memory onto the NVMe.

The performance improved but not enough to run the model.

Based on readme file, Jetson Nano Orin should be able to run the lightest model as per the news on 2024/01.

Is there anything I should consider to be able to run it? If not possible, is there any guidelines or way to load a lighter model into TinyChatEngine?

Thanks in advance!

mit-han-lab / TinyChatEngine

Jetson Nano Orin 8GB running out of memory on LLaMA2_7B_chat_awq_int4 #96