Enhance logging - Githubissues

alyst commented 7 months ago

Some minor quality of life logging tweaks:

add logperiod option to specify how frequently to log the optimization state
when the convergence fails, the error/log message would show the final gradient norm (I'm not sure how to call that matrix though), so the user can estimate how far it is from the atol
atol is also used to compare the best and current solution. So when rotate() reports how many random starts reached the same criterion, it matches the specified tolerance (alternatively, rotate() may support both g_atol and f_atol)

p-gw commented 6 months ago

This looks good to me!

What should the default value of logperiod be? In my experience the algorithm converges after only a few iterations, so 100 would log only the first and last iteration in many cases. Maybe we should think of it as "thinning" the log messages, making the default 1 (logging every iteration) and requesting fewer iterations on demand.

Also I'm not quite sure about comparing the minima with atol. The current implementation with isapprox is supposed to catch tolerances caused by floating point accuracy. Otherwise it mimics the behaviour of the GPARotation package. What is the argument for using a custom tolerance here?

alyst commented 6 months ago

What should the default value of logperiod be?

In my usecase it's in the 10^5 range. Most likely it's because some of the factors are noisy, and the algorithm, after quickly converging to a plateau, struggles to find the very best rotation. But it's not an unlikely scenario. In such case, logging each iteration quickly floods the log. Interrupting the rotate() may potentially crash the Julia session. On the other hand, if rotate() converges quickly, it's likely that the GP algorithm is not a problem (and it would still log the first and the last iteration states). So, I'd say 100 is a safe choice to get a comprehensible log on the initial run.

alyst commented 6 months ago

What is the argument for using a custom tolerance here?

I assume in case of noisy factors random starts may quickly converge to a plateau, but not necessarily to the same point on it; or it can take forever to converge to the very same point. So the fact that the criteria are not within the machine precision, and maybe only a single start converged to the very specific criterion value, might potentially confuse the user that the solution is not stable at all. Basically, it is just introducing the function value convergence criterion to the set of options.

p-gw commented 6 months ago

I see, makes sense to me. However, I think we should treat the tolerances separately. So keep atol as the absolute tolerance for the GP algorithm and introduce maybe ctol (comparison tolerance) for the comparison. Keeping the default for ctol at 1e-6 seems like a sensible choice to me.

codecov[bot] commented 6 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 91.70%. Comparing base (c2bb45f) to head (554b310). Report is 1 commits behind head on main.

:exclamation: Current head 554b310 differs from pull request most recent head 132dfff. Consider uploading reports for the commit 132dfff to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #55 +/- ## ========================================== + Coverage 91.68% 91.70% +0.01% ========================================== Files 23 23 Lines 457 458 +1 ========================================== + Hits 419 420 +1 Misses 38 38 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alyst commented 6 months ago

I think we should treat the tolerances separately.

Agreed. I would update the PR, just wanted to clarify whether it would make sense to follow Optim.jl conventions for naming the tolerance options?

atol is essentially g_tol (but Frobenius norm instead of infinity). FactorRotations.jl can call it g_atol. I have not read into the details of the gradient projection method -- maybe gradient convergence is a very strong indication of overall convergence. But potentially it could be confused with argument convergence (x_atol), so it could be nice to clarify it already in the option name.
ctol is essentially f_atol (f_tol of Optim.jl is for specifying relative tolerance)

p-gw commented 6 months ago

I don' see a strong reason why we should adhere to Optim.jl naming convention. I think sticking with atol is fine, since it is also used e.g. in isapprox(; atol = ...) so it should be familiar for most julia users.

As ctol is really just for logging purposes naming probably doesn't really matter that much. Maybe I even prefer qtol since Q is used in the package as well as the paper for the criterion function Q(Λ).

alyst commented 6 months ago

It's a bit more general than just Optim - the method has chosen the gradient convergence as the criterion, which might be the most reasonable thing; I just think it would be nice if already the argument name makes this clear. I think x, f and g are pretty standard prefixes to specify the convergence type.

p-gw commented 6 months ago

It's a bit more general than just Optim - the method has chosen the gradient convergence as the criterion, which might be the most reasonable thing; I just think it would be nice if already the argument name makes this clear. I think x, f and g are pretty standard prefixes to specify the convergence type.

Sounds good, I will make the changes and merge it afterwards.

p-gw / FactorRotations.jl

Enhance logging #55

Codecov Report