rust-cv / levenberg-marquardt

Provides abstractions to run Levenberg-Marquardt optimization
MIT License
55 stars 16 forks source link

Fix sign error in update direction calculation #5

Closed jannschu closed 4 years ago

jannschu commented 4 years ago

Wrong sign was used for new parameters.

Fixes the issue described in #4.

edit: I pushed a possible fix for the line fitting test case. You might want to check my derivatives, just to be sure.

vadixidav commented 4 years ago

I will review the line fitting test case sometime today or tomorrow. I am a bit busy atm, but I definitely will prioritize this.

vadixidav commented 4 years ago

I also did change the API slightly in master to accommodate optimization I am doing with lie algebras on rotation matrices. I don't think it will interfere with your code, but can you pull that in to this branch.

J-F-Liu commented 4 years ago

Hi all, thanks for sharing this code. I also want to use the algorithm and get here. It depends on how residual is calculated. Let Y denote measured values, Ŷ denote estimated values. If you define ε = Y - Ŷ, then no need to add the minus sign. I prefer this definition, which is also used on Wikipedia.

vadixidav commented 4 years ago

@J-F-Liu You may want to wait a bit to use this as @jannschu is working on a branch that refactors it. See #4.

jannschu commented 4 years ago

Hi all, thanks for sharing this code. I also want to use the algorithm and get here. It depends on how residual is calculated. Let Y denote measured values, Ŷ denote estimated values. If you define ε = Y - Ŷ, then no need to add the minus sign. I prefer this definition, which is also used on Wikipedia.

If your residual is of this form and you flip the sign, your Jacobian also flips sign. So this is not correct, the sign was wrong, I think.

edit: Should clarify that the Jacobian has to be computed with respect to the residual vector. Not a measurement.

vadixidav commented 4 years ago

@jannschu If the jacobian is negative and you flip the residuals to be negative then it should cancel out, unless I am mistaken.

J-F-Liu commented 4 years ago

Jacobian doesn't relate to residual. From my understanding,

residual = target - model(params)
jacobian = partial_diff(model, params)
mpizenberg commented 4 years ago

@J-F-Liu Each term (i,j) of the Jacobian is of the form partial diff of residual i with regard to variable j. So yes the sign of the Jacobian depends on how you compute your residual. Usually we model our problem as a "data fitting problem" (look for "In data fitting" in https://en.wikipedia.org/wiki/Gauss%E2%80%93Newton_algorithm). And in that case, we are computing the partial derivatives of (data - model). Therefore if you compute the "Jacobian of the model function" and not the "Jacobian of the residuals: (data_i - model_i)" the "model" appears with a negative sign. In that setup, if you still call the "gradient" the value "J^T r" then the guess is update with `guess += hessian gradientinstead of-=`.

If I read the code correctly, the update guess function is left to the implementation. So the implementation functions of apply_delta, residuals, and jacobians have to be coherent with each other. In any case, accumulating gradients should just be grad += J_i^T * r_i there should not be any minus sign here in my opinion so I don't really agree with the change. I suppose the issue here is more regarding naming and documentation of what those are so that implementations of the 3 functions are correct with regard to each other. But I'll let you @vadixidav and @J-F-Liu decide how to proceed.

J-F-Liu commented 4 years ago

I see, in case of computing the partial derivatives of (data - model), we can think of target values are 0, and either (data - model) or (model-data) as a new_model, so that residual = target - new_model(params), jacobian = partial_diff(new_model, params) and guess += hessian * gradient still holds, thus simplifying API definition.

vadixidav commented 4 years ago

@mpizenberg I am letting @jannschu decide which form we want. I think we are going with the more traditional form (the one stated on the Wikipedia page), which is fine. @jannschu is doing a huge rewrite, so this PR is mostly irrelevant anyways.