Closed ghost closed 1 year ago
I'd try to manually download the 'decapoda-research/llama-7b-hf' weights locally and try from there, just in case.
I'd try to manually download the 'decapoda-research/llama-7b-hf' weights locally and try from there, just in case.
That's what I did with alpaca.cpp.
In terms of Alpaca-lora model files - I see the same 405MB shards as I see on my .cache. Wouldn't want to risk throttling my internet connection as these are huge files.
I suspect the Jetson Nano CPU and RAM aren't powerful enough for Alpaca, or there's something about the architecture that disagrees with it and causes this memory flood. I do have an RTX 3060 PC which I'm sure will run this model, but unfortunately that means no smart talking robots for now.
Jetson Nano CPU: Quad-core ARM Cortex-A57 MPCore processor (1.5GHz) Raspberry Pi 4 CPU: Broadcom BCM2711 SoC with a 1.5 GHz (later models: 1.8 GHz) 64-bit quad-core ARM Cortex-A72 processor
I'd love to try this on something like a Khadas Edge2 because it looks obvious to me that the Nvidia Jetson Nano is going to be no good for AI. Unless Jetson Orin Nano is released with sufficient stock.
p.s. It should be worth trying again with overclocking. Once I can find a good enough power supply.
Tested the model on Llama.cpp and it finally worked - but is far too slow to be usable. (Takes about 10 minutes to generate a full response.)
The bottleneck seems to be the RAM. The model requires 4GB but it goes just enough over the limit to spill over into swap.
I will definitely be purchasing the Khadas Edge 2 pro (16GB) next.
@twinlizzie @twinlizzie amazing ! would you please share how to do so !!! I was stuck in it too
Testing on Nvidia Jetson Nano 4GB with 16GB of swap memory.
Gets hung up and frozen when loading the model - until the process is Killed. This exact same problem happens also with alpaca.cpp
I'm obviously only using CPU here because I take it that bitsandbytes won't compile with a Maxwell GPU (128 cores, CUDA 10.2)