Closed jeremy-myers closed 1 week ago
Digging in a little, there's two ways that are immediately obvious:
gcp_opt
tutorial that shows an example callback.The benefit of 2 over 1 is that the timer can be initialized in solve()
so that is accurate. Here's what that might look like:
class LBFGSB:
<constructor omitted>
def solve( # noqa: PLR0913
self,
initial_model: ttb.ktensor,
data: ttb.tensor,
function_handle: function_type,
gradient_handle: function_type,
lower_bound: float = -np.inf,
mask: Optional[np.ndarray] = None,
) -> Tuple[ttb.ktensor, Dict]:
<code omitted>
if "callback" not in self._solver_kwargs:
monitor = LBFGSB.Monitor(self._solver_kwargs["maxiter"])
self._solver_kwargs["callback"] = monitor.func
final_vector, final_f, lbfgsb_info = fmin_l_bfgs_b(
lbfgsb_func_grad,
x0,
fprime=None,
approx_grad=False,
bounds=[(lower_bound, np.inf)] * len(x0),
**self._solver_kwargs,
)
model.update(np.arange(initial_model.ndims), final_vector)
lbfgsb_info["final_f"] = final_f
if callback is not None:
lbfgsb_info["callback"] = monitor.time_trace[:lbfgsb_info['nit']]
# TODO big print output
return model, lbfgsb_info
class Monitor:
def __init__(self, maxiter):
self.startTime = time.monotonic()
self.time_trace = np.zeros((maxiter,))
self.iter = 0
def func(self, xk, *args):
self.time_trace[self.iter] = time.monotonic() - self.startTime
self.iter += 1
This is just an idea. @dmdunla @ntjohnson1 have any thoughts?
A few thoughts:
A few thoughts:
- I think we should use perf_counter instead of monotonic (when I ran a toy problem the floating point rounding was showing 0)
gcp.optimizers.StochasticSolver
already use monotonic, so I followed that pattern. I could similarly change that if desired.
- Is there a benefit to seeing the per iteration time inside LBFGS over just doing a similar total time measurement as MATLAB?
- If the answer to 2 is yes then I like the monitor idea, but we should be able to do that in addition to whatever user callback is provided so that we always report the time. I doubt users will use the callback too much but still nice to keep the timing if they want to use it
The answer to 2 is yes. My sense from @dmdunla is that we're not yet ready to diverge from Matlab, so that motivated the simple default monitor callback that tracks per iteration time. Toward that end, I'll still follow Matlab and report the total time measurements.
I'm definitely bike shedding. I'm fine with the approach you described above. My thoughts aren't blocking.
gcp.optimizers.StochasticSolver already use monotonic, so I followed that pattern. I could similarly change that if desired.
That's fair. I don't know if I made that choice intentionally, or since those are looking at outer loop rather than inner loop the precision wasn't critical. Assuming the timings on outer loop are at most minutes then perf_counter for both locations is probably reasonable. (If the time is too long the high precision timer may wrap around, or if the time is too show the lower precision timer may show 0). For the time being probably not critical but if it causes headache we could probably make a little utility that uses bot hand deconflicts based on some logic.
The answer to 2 is yes
Gotcha if we're going to add a more in-depth callback later what do you think about something where we capture the optionally provided callback? We could even sum all the lbfgsb steps if we wanted to be consistent with MATLAB for now.
class LBFGSB:
<constructor omitted>
def solve( <snipped>):
<skipped>
monitor = LBFGSB.Monitor(self._solver_kwargs["maxiter"], self._solver_kwargs["callback"])
self._solver_kwargs["callback"] = monitor
<skipped>
lbfgsb_info["callback"] = monitor.time_trace[:lbfgsb_info['nit']]
class Monitor:
def __init__(self, maxiter: int, callback: Optional[Callable]):
self.startTime = time.monotonic()
self.time_trace = np.zeros((maxiter,))
self.iter = 0
self._callback = callback
def func(self, xk, *args):
if self._callback is not None:
self._callback()
self.time_trace[self.iter] = time.monotonic() - self.startTime
self.iter += 1
Off the bat, I don't see any issue with that approach! I need to refer back to MATLAB for all of the details before I wrap this up.
@jermyer can you be more specific regarding what you see in MATLAB and what you see in pyttb
to demonstrate how the latter diverges from the former? It seems like there are more differences than just the timings, but perhaps I am running differently than you; hence the request.
Here is what I see in MATLAB using gcp_opt
with the lbfgsb
optimizer:
>> X = full(ktensor(@rand, [4 5 3], 2));
>> M = gcp_opt(X,2,'type','normal','opt','lbfgsb','printitn',10);
GCP-OPT-LBFGSB (Generalized CP Tensor Decomposition)
Tensor size: 4 x 5 x 3 (60 total entries)
Generalized function Type: normal
Objective function: @(x,m)(m-x).^2
Gradient function: @(x,m)2.*(m-x)
Lower bound of factor matrices: -Inf
Optimization method: lbfgsb
Max iterations: 1000
Projected gradient tolerance: 0.006
Begin Main loop
Iter 10, f(x) = 8.474170e-03, ||grad||_infty = 2.79e-02
Iter 19, f(x) = 3.247076e-03, ||grad||_infty = 3.18e-03
End Main Loop
Final objective: 3.2471e-03
Setup time: 0.06 seconds
Main loop time: 0.04 seconds
Outer iterations: 19
Total iterations: 45
L-BFGS-B Exit message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.
And in pyttb
:
>>> import pyttb as ttb
>>> import numpy as np
>>> X = ttb.ktensor([np.random.rand(4,2), np.random.rand(5,2), np.random.rand(3,2)]).full()
>>> from pyttb.gcp.handles import Objectives
>>> from pyttb.gcp.optimizers import LBFGSB
>>> M = ttb.gcp_opt(X,2,objective=Objectives.GAUSSIAN,optimizer=LBFGSB(maxiter=2,iprint=1))
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 24 M = 10
This problem is unconstrained.
At X0 0 variables are exactly at the bounds
At iterate 0 f= 9.43612D+00 |proj g|= 8.06604D+00
At iterate 1 f= 3.37787D+00 |proj g|= 3.20394D+00
At iterate 2 f= 4.94695D-01 |proj g|= 1.39324D+00
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
24 2 3 1 0 0 1.393D+00 4.947D-01
F = 0.49469514164549688
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT
Before we make some design decisions, I suggest we agree on what we want the expected behavior to be, using examples.
@dmdunla The difference is in the solver meta data. Using your setups (above):
>> [M,~,info] = gcp_opt(X,2,'type','normal','opt','lbfgsb','printitn',10);
<omitted>
>> info.lbfgsout
ans =
struct with fields:
iterations: 37
totalIterations: 80
lbfgs_message1: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.'
err: [37×3 double]
>> info.lbfgsout.err
ans =
1.8554 2.3915 0.0046
0.3142 0.8945 0.0056
0.1941 0.2908 0.0068
0.1649 0.2943 0.0077
0.1436 0.2652 0.0089
0.0869 0.1462 0.0099
0.0409 0.2881 0.0108
0.0237 0.0873 0.0136
0.0207 0.0426 0.0155
0.0195 0.0418 0.0164
0.0163 0.1114 0.0175
0.0150 0.0377 0.0214
0.0145 0.0185 0.0227
0.0141 0.0153 0.0239
0.0130 0.0223 0.0249
0.0123 0.0838 0.0259
0.0109 0.0351 0.0268
0.0094 0.0350 0.0287
0.0081 0.0737 0.0300
0.0074 0.0489 0.0315
0.0060 0.0370 0.0326
0.0050 0.0434 0.0337
0.0044 0.0325 0.0346
0.0037 0.0281 0.0361
0.0031 0.0454 0.0368
0.0028 0.0223 0.0376
0.0022 0.0135 0.0383
0.0017 0.0255 0.0400
0.0012 0.0464 0.0408
0.0010 0.0389 0.0416
0.0007 0.0188 0.0425
0.0006 0.0253 0.0440
0.0004 0.0166 0.0448
0.0002 0.0122 0.0457
0.0001 0.0098 0.0465
0.0000 0.0090 0.0474
0.0000 0.0039 0.0482
Timings per iteration are in the 3rd column of info.lbfgsout.err
.
>>> M,_,info = ttb.gcp_opt(X,2,objective=Objectives.GAUSSIAN,optimizer=LBFGSB(maxiter=2,iprint=1))
<omitted>
>>> info
{'final_f': 0.3399475022363157,
'funcalls': 3,
'grad': array([-0.16396588, -0.21731916, -0.53893214, -0.60113412, -0.1930229 ,
-0.10820226, -0.5601236 , -0.42140729, -0.11747989, -0.45527118,
-0.59404463, -0.58487822, -0.25143333, -0.12816312, -0.43379202,
-0.52768768, -0.36537568, -0.33692816, -0.55402181, -0.41787058,
-0.75805704, -0.40138519, -0.60094681, -0.43743309]),
'nit': 2,
'task': 'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT',
'warnflag': 1}
In pyttb, timings are only given to stdout. (Part of this is that the mainTime
, setupTimeA
, setupTimeB
, and setupTime
fields aren't being recorded. If we want to mirror Matlab, then those fields should be included. The setupTime*
fields may be too much). mainTime
and lbfgsb per iteration time are reasonable to include.
Thanks for these clarifications. I now see what you mean. I’ll take a closer look.
gcp_opt
doesn't track timings internally when LBFGS is the optimizer. Matlab has this (only using a simple tic/toc outside the call to the C code/mex). See here