sandialabs / pyttb

Python Tensor Toolbox
https://pyttb.readthedocs.io
BSD 2-Clause "Simplified" License
24 stars 13 forks source link

Add timings for `gcp_opt` with L-BFGS. #313

Closed jeremy-myers closed 1 week ago

jeremy-myers commented 2 months ago

gcp_opt doesn't track timings internally when LBFGS is the optimizer. Matlab has this (only using a simple tic/toc outside the call to the C code/mex). See here

jeremy-myers commented 2 months ago

Digging in a little, there's two ways that are immediately obvious:

  1. Provide a section in the gcp_opt tutorial that shows an example callback.
  2. Directly implement a default callback in the LBFGS class.

The benefit of 2 over 1 is that the timer can be initialized in solve() so that is accurate. Here's what that might look like:

class LBFGSB:

    <constructor omitted>

    def solve(  # noqa: PLR0913
        self,
        initial_model: ttb.ktensor,
        data: ttb.tensor,
        function_handle: function_type,
        gradient_handle: function_type,
        lower_bound: float = -np.inf,
        mask: Optional[np.ndarray] = None,
    ) -> Tuple[ttb.ktensor, Dict]:

        <code omitted>

        if "callback" not in self._solver_kwargs:
            monitor = LBFGSB.Monitor(self._solver_kwargs["maxiter"])
            self._solver_kwargs["callback"] = monitor.func

        final_vector, final_f, lbfgsb_info = fmin_l_bfgs_b(
            lbfgsb_func_grad,
            x0,
            fprime=None,
            approx_grad=False,
            bounds=[(lower_bound, np.inf)] * len(x0),
            **self._solver_kwargs,
        )
        model.update(np.arange(initial_model.ndims), final_vector)
        lbfgsb_info["final_f"] = final_f

        if callback is not None:
            lbfgsb_info["callback"] = monitor.time_trace[:lbfgsb_info['nit']]

        # TODO big print output
        return model, lbfgsb_info

    class Monitor:
        def __init__(self, maxiter):
            self.startTime = time.monotonic()
            self.time_trace = np.zeros((maxiter,))
            self.iter = 0

        def func(self, xk, *args):
            self.time_trace[self.iter] = time.monotonic() - self.startTime
            self.iter += 1

This is just an idea. @dmdunla @ntjohnson1 have any thoughts?

ntjohnson1 commented 2 months ago

A few thoughts:

  1. I think we should use perf_counter instead of monotonic (when I ran a toy problem the floating point rounding was showing 0)
  2. Is there a benefit to seeing the per iteration time inside LBFGS over just doing a similar total time measurement as MATLAB?
  3. If the answer to 2 is yes then I like the monitor idea, but we should be able to do that in addition to whatever user callback is provided so that we always report the time. I doubt users will use the callback too much but still nice to keep the timing if they want to use it
jeremy-myers commented 2 months ago

A few thoughts:

  1. I think we should use perf_counter instead of monotonic (when I ran a toy problem the floating point rounding was showing 0)

gcp.optimizers.StochasticSolver already use monotonic, so I followed that pattern. I could similarly change that if desired.

  1. Is there a benefit to seeing the per iteration time inside LBFGS over just doing a similar total time measurement as MATLAB?
  2. If the answer to 2 is yes then I like the monitor idea, but we should be able to do that in addition to whatever user callback is provided so that we always report the time. I doubt users will use the callback too much but still nice to keep the timing if they want to use it

The answer to 2 is yes. My sense from @dmdunla is that we're not yet ready to diverge from Matlab, so that motivated the simple default monitor callback that tracks per iteration time. Toward that end, I'll still follow Matlab and report the total time measurements.

ntjohnson1 commented 2 months ago

I'm definitely bike shedding. I'm fine with the approach you described above. My thoughts aren't blocking.

gcp.optimizers.StochasticSolver already use monotonic, so I followed that pattern. I could similarly change that if desired.

That's fair. I don't know if I made that choice intentionally, or since those are looking at outer loop rather than inner loop the precision wasn't critical. Assuming the timings on outer loop are at most minutes then perf_counter for both locations is probably reasonable. (If the time is too long the high precision timer may wrap around, or if the time is too show the lower precision timer may show 0). For the time being probably not critical but if it causes headache we could probably make a little utility that uses bot hand deconflicts based on some logic.

The answer to 2 is yes

Gotcha if we're going to add a more in-depth callback later what do you think about something where we capture the optionally provided callback? We could even sum all the lbfgsb steps if we wanted to be consistent with MATLAB for now.

class LBFGSB:

    <constructor omitted>

    def solve( <snipped>):
       <skipped>
       monitor = LBFGSB.Monitor(self._solver_kwargs["maxiter"], self._solver_kwargs["callback"])
       self._solver_kwargs["callback"] = monitor
       <skipped>
      lbfgsb_info["callback"] = monitor.time_trace[:lbfgsb_info['nit']]

class Monitor:
        def __init__(self, maxiter: int, callback: Optional[Callable]):
            self.startTime = time.monotonic()
            self.time_trace = np.zeros((maxiter,))
            self.iter = 0
            self._callback = callback

        def func(self, xk, *args):
            if self._callback is not None:
              self._callback()    
            self.time_trace[self.iter] = time.monotonic() - self.startTime
            self.iter += 1
jeremy-myers commented 2 months ago

Off the bat, I don't see any issue with that approach! I need to refer back to MATLAB for all of the details before I wrap this up.

dmdunla commented 2 months ago

@jermyer can you be more specific regarding what you see in MATLAB and what you see in pyttb to demonstrate how the latter diverges from the former? It seems like there are more differences than just the timings, but perhaps I am running differently than you; hence the request.

Here is what I see in MATLAB using gcp_opt with the lbfgsb optimizer:

>> X = full(ktensor(@rand, [4 5 3], 2));
>> M = gcp_opt(X,2,'type','normal','opt','lbfgsb','printitn',10);

GCP-OPT-LBFGSB (Generalized CP Tensor Decomposition)

Tensor size: 4 x 5 x 3 (60 total entries)
Generalized function Type: normal
Objective function: @(x,m)(m-x).^2
Gradient function: @(x,m)2.*(m-x)
Lower bound of factor matrices: -Inf
Optimization method: lbfgsb
Max iterations: 1000
Projected gradient tolerance: 0.006

Begin Main loop
Iter    10, f(x) = 8.474170e-03, ||grad||_infty = 2.79e-02
Iter    19, f(x) = 3.247076e-03, ||grad||_infty = 3.18e-03
End Main Loop

Final objective: 3.2471e-03
Setup time: 0.06 seconds
Main loop time: 0.04 seconds
Outer iterations: 19
Total iterations: 45
L-BFGS-B Exit message: CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.

And in pyttb:

>>> import pyttb as ttb
>>> import numpy as np
>>> X = ttb.ktensor([np.random.rand(4,2), np.random.rand(5,2), np.random.rand(3,2)]).full()
>>> from pyttb.gcp.handles import Objectives
>>> from pyttb.gcp.optimizers import LBFGSB
>>> M = ttb.gcp_opt(X,2,objective=Objectives.GAUSSIAN,optimizer=LBFGSB(maxiter=2,iprint=1))
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           24     M =           10
 This problem is unconstrained.

At X0         0 variables are exactly at the bounds

At iterate    0    f=  9.43612D+00    |proj g|=  8.06604D+00

At iterate    1    f=  3.37787D+00    |proj g|=  3.20394D+00

At iterate    2    f=  4.94695D-01    |proj g|=  1.39324D+00

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
   24      2      3      1     0     0   1.393D+00   4.947D-01
  F =  0.49469514164549688     

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT           

Before we make some design decisions, I suggest we agree on what we want the expected behavior to be, using examples.

jeremy-myers commented 2 months ago

@dmdunla The difference is in the solver meta data. Using your setups (above):

MATLAB

>> [M,~,info] = gcp_opt(X,2,'type','normal','opt','lbfgsb','printitn',10);
<omitted>
>> info.lbfgsout

ans = 

  struct with fields:

         iterations: 37
    totalIterations: 80
     lbfgs_message1: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL.'
                err: [37×3 double]

>> info.lbfgsout.err

ans =

    1.8554    2.3915    0.0046
    0.3142    0.8945    0.0056
    0.1941    0.2908    0.0068
    0.1649    0.2943    0.0077
    0.1436    0.2652    0.0089
    0.0869    0.1462    0.0099
    0.0409    0.2881    0.0108
    0.0237    0.0873    0.0136
    0.0207    0.0426    0.0155
    0.0195    0.0418    0.0164
    0.0163    0.1114    0.0175
    0.0150    0.0377    0.0214
    0.0145    0.0185    0.0227
    0.0141    0.0153    0.0239
    0.0130    0.0223    0.0249
    0.0123    0.0838    0.0259
    0.0109    0.0351    0.0268
    0.0094    0.0350    0.0287
    0.0081    0.0737    0.0300
    0.0074    0.0489    0.0315
    0.0060    0.0370    0.0326
    0.0050    0.0434    0.0337
    0.0044    0.0325    0.0346
    0.0037    0.0281    0.0361
    0.0031    0.0454    0.0368
    0.0028    0.0223    0.0376
    0.0022    0.0135    0.0383
    0.0017    0.0255    0.0400
    0.0012    0.0464    0.0408
    0.0010    0.0389    0.0416
    0.0007    0.0188    0.0425
    0.0006    0.0253    0.0440
    0.0004    0.0166    0.0448
    0.0002    0.0122    0.0457
    0.0001    0.0098    0.0465
    0.0000    0.0090    0.0474
    0.0000    0.0039    0.0482

Timings per iteration are in the 3rd column of info.lbfgsout.err.

pyttb

>>> M,_,info = ttb.gcp_opt(X,2,objective=Objectives.GAUSSIAN,optimizer=LBFGSB(maxiter=2,iprint=1))
<omitted>
>>> info
{'final_f': 0.3399475022363157,
 'funcalls': 3,
 'grad': array([-0.16396588, -0.21731916, -0.53893214, -0.60113412, -0.1930229 ,
       -0.10820226, -0.5601236 , -0.42140729, -0.11747989, -0.45527118,
       -0.59404463, -0.58487822, -0.25143333, -0.12816312, -0.43379202,
       -0.52768768, -0.36537568, -0.33692816, -0.55402181, -0.41787058,
       -0.75805704, -0.40138519, -0.60094681, -0.43743309]),
 'nit': 2,
 'task': 'STOP: TOTAL NO. of ITERATIONS REACHED LIMIT',
 'warnflag': 1}

In pyttb, timings are only given to stdout. (Part of this is that the mainTime, setupTimeA, setupTimeB, and setupTime fields aren't being recorded. If we want to mirror Matlab, then those fields should be included. The setupTime* fields may be too much). mainTime and lbfgsb per iteration time are reasonable to include.

dmdunla commented 2 months ago

Thanks for these clarifications. I now see what you mean. I’ll take a closer look.