Closed alyst closed 6 months ago
This looks good to me!
What should the default value of logperiod
be? In my experience the algorithm converges after only a few iterations, so 100
would log only the first and last iteration in many cases. Maybe we should think of it as "thinning" the log messages, making the default 1
(logging every iteration) and requesting fewer iterations on demand.
Also I'm not quite sure about comparing the minima with atol
. The current implementation with isapprox
is supposed to catch tolerances caused by floating point accuracy. Otherwise it mimics the behaviour of the GPARotation package. What is the argument for using a custom tolerance here?
What should the default value of logperiod be?
In my usecase it's in the 10^5 range. Most likely it's because some of the factors are noisy, and the algorithm, after quickly converging to a plateau, struggles to find the very best rotation.
But it's not an unlikely scenario. In such case, logging each iteration quickly floods the log. Interrupting the rotate()
may potentially crash the Julia session.
On the other hand, if rotate()
converges quickly, it's likely that the GP algorithm is not a problem (and it would still log the first and the last iteration states).
So, I'd say 100 is a safe choice to get a comprehensible log on the initial run.
What is the argument for using a custom tolerance here?
I assume in case of noisy factors random starts may quickly converge to a plateau, but not necessarily to the same point on it; or it can take forever to converge to the very same point. So the fact that the criteria are not within the machine precision, and maybe only a single start converged to the very specific criterion value, might potentially confuse the user that the solution is not stable at all. Basically, it is just introducing the function value convergence criterion to the set of options.
I see, makes sense to me. However, I think we should treat the tolerances separately. So keep atol
as the absolute tolerance for the GP algorithm and introduce maybe ctol
(comparison tolerance) for the comparison. Keeping the default for ctol
at 1e-6 seems like a sensible choice to me.
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 91.70%. Comparing base (
c2bb45f
) to head (554b310
). Report is 1 commits behind head on main.:exclamation: Current head 554b310 differs from pull request most recent head 132dfff. Consider uploading reports for the commit 132dfff to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I think we should treat the tolerances separately.
Agreed. I would update the PR, just wanted to clarify whether it would make sense to follow Optim.jl conventions for naming the tolerance options?
atol
is essentially g_tol
(but Frobenius norm instead of infinity). FactorRotations.jl
can call it g_atol
. I have not read into the details of the gradient projection method -- maybe gradient convergence is a very strong indication of overall convergence. But potentially it could be confused with argument convergence (x_atol
), so it could be nice to clarify it already in the option name.ctol
is essentially f_atol
(f_tol
of Optim.jl is for specifying relative tolerance)I don' see a strong reason why we should adhere to Optim.jl naming convention.
I think sticking with atol
is fine, since it is also used e.g. in isapprox(; atol = ...)
so it should be familiar for most julia users.
As ctol
is really just for logging purposes naming probably doesn't really matter that much.
Maybe I even prefer qtol
since Q is used in the package as well as the paper for the criterion function Q(Λ).
It's a bit more general than just Optim - the method has chosen the gradient convergence as the criterion, which might be the most reasonable thing; I just think it would be nice if already the argument name makes this clear. I think x, f and g are pretty standard prefixes to specify the convergence type.
It's a bit more general than just Optim - the method has chosen the gradient convergence as the criterion, which might be the most reasonable thing; I just think it would be nice if already the argument name makes this clear. I think x, f and g are pretty standard prefixes to specify the convergence type.
Sounds good, I will make the changes and merge it afterwards.
Some minor quality of life logging tweaks:
logperiod
option to specify how frequently to log the optimization stateatol
atol
is also used to compare the best and current solution. So whenrotate()
reports how many random starts reached the same criterion, it matches the specified tolerance (alternatively,rotate()
may support bothg_atol
andf_atol
)