Load_in_8bit causing issues: Out of memory error with 44Gb VRAM in my GPU or device_map error

I'm able to get the generate.py script working. However, with the finetune.py script I'm facing the following error:

It seems to be because the load_in_8bit parameter is set to True and it's looking for a quantisation_config,json but if I set it to False then even a GPU with vRAM of 44Gb is not enough to train the model. How do I create the quantisation_config,json? I'm using huggyllama/llama-7b as the base model since the given link for the base model is down. I face the same error when I use baffo32/decapoda-research-llama-7B-hf as the base model.

Any help would be appreciated, thank you!

tloen / alpaca-lora

Load_in_8bit causing issues: Out of memory error with 44Gb VRAM in my GPU or device_map error #604