Open normster opened 6 months ago
The results reported in https://github.com/huggingface/alignment-handbook/pull/88 suggest that QLoRA is better for both SFT and DPO. Is this accurate, and have people seen this happen in any other settings?
The results reported in https://github.com/huggingface/alignment-handbook/pull/88 suggest that QLoRA is better for both SFT and DPO. Is this accurate, and have people seen this happen in any other settings?