Thank you for your excellent work. Since there's no mention of the training stage, does that mean 160k data is trained on LLM using LoRA and llama projection layer at the same time? If that is the case, do I expect the save_to_modules: [llama projection] or you just simply turn the projection layer require gradients as true?
Hi @gordonhu608, thanks for your interest in our work! Yes, we jointly train the LoRA weights and project layer during the training process. To do so, we simply allow the gradients to flow back to the parameters of the projection layer. Please see the related code here.
Thank you for your excellent work. Since there's no mention of the training stage, does that mean 160k data is trained on LLM using LoRA and llama projection layer at the same time? If that is the case, do I expect the save_to_modules: [llama projection] or you just simply turn the projection layer require gradients as true?