Unable to save checkpoints

Hi Team,

I was trying to finetune open-llama 7b on 20gb A100 with LORA with batch-size =1 & max_seq_lenth = 256 but while saving the checkpoints through huggingface transformers.trainer I am getting cuda out of memory.

As per my observation, the model & batch on total took around 10 GB vram & it was constant throughout the training but when trainer trying to save checkpoint at specific step its failing with cuda OOM. And when I tried the same finetuning code to META llama-7b it is working fine & checkpoints also getting save without any memory overhead .

As per https://github.com/openlm-research/open_llama/issues/1#issuecomment-1532311414 - if open llama-7b has same model size & architecture as meta llama-7b then why I am facing cuda OOM, ideally it should work same for both.

If anyone can look into it & help me out.

openlm-research / open_llama

Unable to save checkpoints #50