Open rarhs opened 4 months ago
How much CPU/GPU resoures are required?
Hello, I have an idea.
Because the model is loaded by the CPU, I am using my notebook (16G RAM) running and cannot load the Llama3-8B model.
So, I took the first two Transformers layers from the 32-layers architecture of the Llama3-8B model to form a new model. This can be run in a notebook with 16G RAM, occupying about 4~5G RAM, but the final decoding result is wrong, and the middle result is all correct.
You can try it with this model.
and the colab link, it can run directly:
Can't run on free colab due to not having adequate RAM.