hahnec / torchimize

numerical optimization in pytorch
https://hahnec.github.io/torchimize/
GNU General Public License v3.0
130 stars 7 forks source link

Independent stopping criterion #6

Closed pkienzle closed 10 months ago

pkienzle commented 1 year ago

Fit results for any given fit depend on which other fits are happening in parallel.

One issue is the stopping condition, which applies to all fits simultaneously:

https://github.com/hahnec/torchimize/blob/2ccbec4e68f41c34bb7d242ff4848d26d72a9765/torchimize/functions/parallel/lma_fun_parallel.py#L108-L111

An individual fit may take more or fewer steps in batch than it would have done if it were performed alone. More because it continues to update even when it has converged because other fits in the block have not yet converged. Fewer because the badness in the residuals of any one fit is spread across the residuals in all fits.

A two-evaluation would help, using sum/max over the data dimensions to produce a vector of individual stopping criteria. You can then stop if all criteria are met. This will address the case of stopping too soon for some datasets.

For the other case you can suppress the update of the parameters for those fits that have already converged. This doesn't save any computation time, and will overall lead to worse fits (assuming more steps are always better when fitting), but it does mean that looping over single fits should in principle give the same results as performing fits in batch.

hahnec commented 10 months ago

Vectorized stopping is an interesting point. It makes sense to only stop once all parameters converged rather than a single parameter set. I have introduced multiple stop conditions in a future version to break only when optimization criteria are met for the entire batch.

Your last point seems to be a matter of style and taste. Would one like to get a less accurate result just to be consistent with a single optimization? Not so convincing.. Especially, if chances are high that one can get a more accurate estimation "for free". On the contrary, it can be safer to lock those parameter estimates that reached a criterion when prioritizing reliability over accuracy. And even if accuracy of individual parameters is the primary goal, one can still set the tolerance thresholds close to zero, increase max_iter and find most accurate estimates by the minimum cost. After all, I think locking those estimates which fulfill stop criteria can be helpful. If undesired, the tolerance arguments give sufficient control to avoid locking and push accuracy limits. This scheme is now also included in the new version.

It is worth noting that the discussed points have less of an impact as soon as max_iter is small or the initial guesses are poorly approximated. For example, using the unit test for parallel gradient descent, I noticed an improvement by proceeding when a single parameter set converged (99 iterations, all tolerances at 1e-7). This was due to 5 more iterations, so I would expect similar results for lower tolerances in the single condition case. I wasn't able to perceive a difference by locking estimates that reached tolerance boundaries. To conclude, multiple conditions help in stopping later when max_iter is large enough.

Test results:

single condition stop where max_iter=99

float(f.sum())
2.464080572128296
pcon.sum()
tensor(1)
gcon.sum()
tensor(0)
len(p_list)
19

multi-condition stop w/o locking where max_iter=99

float(f.sum())
2.4640729427337646
pcon.sum()
tensor(6)
gcon.sum()
tensor(0)
len(p_list)
24

multi-condition stop with locking where max_iter=99

float(f.sum())
2.4640729427337646
pcon.sum()
tensor(6)
gcon.sum()
tensor(0)
len(p_list)
24