Jetson Nano - Freezes when loading model

ghost commented 1 year ago

Testing on Nvidia Jetson Nano 4GB with 16GB of swap memory.

Gets hung up and frozen when loading the model - until the process is Killed. This exact same problem happens also with alpaca.cpp


(venv) twinlizzie@twinlizzie-desktop:~/alpaca-lora$ python generate.py --load_8bit --base_model 'decapoda-research/llama-7b-hf' --lora_weights 'tloen/alpaca-lora-7b'

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
/home/twinlizzie/venv/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
/home/twinlizzie/venv/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: cannot open shared object file: No such file or directory
/home/twinlizzie/venv/lib/python3.9/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards:   0%|                         | 0/33 [00:00<?, ?it/s]Loading checkpoint shards:   0%|                         | 0/33 [00:01<?, ?it/s

I'm obviously only using CPU here because I take it that bitsandbytes won't compile with a Maxwell GPU (128 cores, CUDA 10.2)

AngainorDev commented 1 year ago

I'd try to manually download the 'decapoda-research/llama-7b-hf' weights locally and try from there, just in case.

ghost commented 1 year ago

I'd try to manually download the 'decapoda-research/llama-7b-hf' weights locally and try from there, just in case.

That's what I did with alpaca.cpp.

In terms of Alpaca-lora model files - I see the same 405MB shards as I see on my .cache. Wouldn't want to risk throttling my internet connection as these are huge files.

I suspect the Jetson Nano CPU and RAM aren't powerful enough for Alpaca, or there's something about the architecture that disagrees with it and causes this memory flood. I do have an RTX 3060 PC which I'm sure will run this model, but unfortunately that means no smart talking robots for now.

Jetson Nano CPU: Quad-core ARM Cortex-A57 MPCore processor (1.5GHz) Raspberry Pi 4 CPU: Broadcom BCM2711 SoC with a 1.5 GHz (later models: 1.8 GHz) 64-bit quad-core ARM Cortex-A72 processor

I'd love to try this on something like a Khadas Edge2 because it looks obvious to me that the Nvidia Jetson Nano is going to be no good for AI. Unless Jetson Orin Nano is released with sufficient stock.

p.s. It should be worth trying again with overclocking. Once I can find a good enough power supply.

ghost commented 1 year ago

Tested the model on Llama.cpp and it finally worked - but is far too slow to be usable. (Takes about 10 minutes to generate a full response.)

The bottleneck seems to be the RAM. The model requires 4GB but it goes just enough over the limit to spill over into swap.

I will definitely be purchasing the Khadas Edge 2 pro (16GB) next.

EveningLin commented 1 year ago

@twinlizzie @twinlizzie amazing ！ would you please share how to do so !!! I was stuck in it too

tloen / alpaca-lora

Jetson Nano - Freezes when loading model #234