Open letterk opened 3 weeks ago
Unfortunately one must use float32 for it - in theory bfloat16 can be used, but the gradients will not be correct due to mixed precision training
I would unset them - another approach is to just train the lm_head
and not the embed_tokens
to save more memory
I cannot train Qwen2 7B on a 4090 GPU as it would result in out-of-memory (OOM) errors due to the loading of the embedding layer. This process is anticipated to demand over 27GB of VRAM, exceeding the capacity of the GPU. In contrast, QLoRA requires significantly less memory, operating effectively with under 12GB.