Closed ashawkey closed 1 year ago
Looks good! Could you verify that the magnitude of the gradient remains the same (especially for data.batch_size > 1
?)
Oh, I didn't realize that. The magnitude of this formulation is not as meaningful as the original one, it involves latents
in the final value instead of only the magnitude of grad
.
If we want the value of loss to be informative, I guess we should keep using the original one.
Yeah this MSE formulation is also cool, and with an even more meaningful loss value. I'd actually prefer this one to the one I suggested
I used to use the latents * grad.detach()
formulation before (~8 months ago) for my own experiments, I find the threestudio loss to be more effective/faster in training. But haven't done extensive experiments on it
@voletiv Faster in training? Why do you think that happens? Kernel for the F.mse_loss gradient computation is faster than for the multiplication?
@Xallt By "faster" I don't mean per compute, I mean iterations from initialization to a good final rendering. Optimization seems more effective (in my experiments). Multiple factors contribute to this, possibly unrelated to the loss as such : the combination of implicit-volume and original loss works very well, but SDF and original loss is not as effective (in my experiments for 2D->3D).
I think @Xallt makes a very good point. Although latents * grad.detach()
is easier to understand, the MSE formulation gives meaningful gradient magnitude. So I think we'll stick with the MSE formulation then.
We can use a even simpler reparameterization to achieve the SDS loss:
This is originally proposed in https://github.com/ashawkey/stable-dreamfusion/issues/335 (credits to @Xallt). Maybe we can implement it here too?