naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time
MIT License
13.22k stars 1.06k forks source link

Free Colab #12

Open rarhs opened 4 months ago

rarhs commented 4 months ago

Can't run on free colab due to not having adequate RAM.

AbnerAI commented 4 months ago

How much CPU/GPU resoures are required?

wdndev commented 4 months ago

Hello, I have an idea.

Because the model is loaded by the CPU, I am using my notebook (16G RAM) running and cannot load the Llama3-8B model.

So, I took the first two Transformers layers from the 32-layers architecture of the Llama3-8B model to form a new model. This can be run in a notebook with 16G RAM, occupying about 4~5G RAM, but the final decoding result is wrong, and the middle result is all correct.

You can try it with this model.

and the colab link, it can run directly: