Thanks for sharing your great work!
I find that the loss of lora (loss_lora) and vsd (loss_vsd) are added and backward together in your code. loss_vsd tends to be large (>50) while loss_lora is small (<1). Does this cause the unstable of the model?
It seems that Lora and nerf have the same loss , is this consistent with the description in the paper?
They are used to optimize different modules. In other words, their computation graphs are separate. So we just add them to simultaneously optimize the lora and the nerf.
Thanks for sharing your great work! I find that the loss of lora (loss_lora) and vsd (loss_vsd) are added and backward together in your code. loss_vsd tends to be large (>50) while loss_lora is small (<1). Does this cause the unstable of the model? It seems that Lora and nerf have the same loss , is this consistent with the description in the paper?