Open Ther-nullptr opened 6 months ago
The loss at the first epoch looks already high. There may be problems in the initialization. Could you provide the code when you load the model? Since we haven't provided the official Gemma-7B LoftQ checkpoint, could you also provide the code how you obtain the quantized backbone and the LoRA adapter by quantize_and_save.py ?
Hello, I have tried your method on gemma-7b model. I found that this method is work on gsm-8k dataset, but this fails on wikitext-2 dataset. This is my training log:
I didn't change the original code. Do you know why?