sandialabs / pyttb

Python Tensor Toolbox
https://pyttb.readthedocs.io
BSD 2-Clause "Simplified" License
21 stars 13 forks source link

`gcp_opt` with Rayleigh loss fails with LBFGS when a random start or starting with `M_true`. #261

Closed jeremy-myers closed 11 months ago

jeremy-myers commented 1 year ago

We encountered this issue when working on the GCP_OPT tutorial. See some discussion and Danny's attempts at finding a good problem here: https://github.com/sandialabs/pyttb/pull/253#issuecomment-1735744679

dmdunla commented 1 year ago

@jeremy-myers can you post an abbreviated example that breaks here? Once the tutorial is fixed, it may not provide a clear example of the behavior we were seeing.

jeremy-myers commented 1 year ago

Minimal working example

rng = np.random.default_rng(65)
rank = 3
shape = (50, 60, 70)
ndims = len(shape)

# Create factor matrices that correspond to smooth sinusidal factors
U = []
for k in np.arange(ndims):
    V = 1.1 + np.cos(
        (2 * np.pi / shape[k] * np.arange(shape[k])[:, np.newaxis])
        * np.arange(1, rank + 1)
    )
    U.append(V[:, rng.permutation(rank)])

M_true = ttb.ktensor(U).normalize()

def make_rayleigh(X):
    xvec = X.reshape((np.prod(X.shape), 1))
    rayl = rng.rayleigh(size=xvec.shape)
    yvec = rayl * xvec.data
    Y = ttb.tensor(yvec, shape=X.shape)
    return Y

X = make_rayleigh(M_true.full())

# Select Rayleigh objective
objective = Objectives.RAYLEIGH

# Select LBFGSB solver
optimizer = LBFGSB(maxiter=10, iprint=1)

# Compute rank-3 GCP approximation to X with GCP-OPT
result_lbfgs, initial_guess, info_lbfgs = ttb.gcp_opt(
    data=X, rank=rank, objective=objective, optimizer=optimizer, printitn=1
)

print(f"\nFinal fit: {1 - np.linalg.norm((X-result_lbfgs.full()).double())/X.norm()}\n")

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =          540     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.84128D+06    |proj g|=  5.64676D+06

At iterate    1    f=  1.38465D+06    |proj g|=  2.28411D+06

At iterate    2    f=  1.33151D+06    |proj g|=  2.07324D+06

At iterate    3    f=  1.33151D+06    |proj g|=  2.07324D+06

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
  540      3     25    193     0   152   2.073D+06   1.332D+06
  F =   1331512.4961980525     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             

Final fit: -0.7102975608909263

CPU times: user 2.1 s, sys: 1.09 s, total: 3.18 s
Wall time: 443 ms

 Bad direction in the line search;
   refresh the lbfgs memory and restart the iteration.

Note that setting np.random.default_rng(65) may vary from machine to machine.

ntjohnson1 commented 1 year ago

If it's not too obnoxious could you generate the failure mode Danny described with all ones for both pyttb and MATLAB? That's probably easiest to do the cross comparison and where I'd start debugging anyway (I believe the underlying LBFGS solvers are the same so then would want to double check if the inputs to the solvers are getting plumbed through correctly)

jeremy-myers commented 1 year ago

Example Rayleigh tensor

Pyttb

X = ttb.import_data('fail_rayleigh.txt')

# Select Rayleigh objective
objective = Objectives.RAYLEIGH

# Select LBFGSB solver
optimizer = LBFGSB(maxiter=10, iprint=1)

shape = (50,60,70)
rank = 3
M_init = ttb.ktensor.from_function(np.ones,shape,rank)
# Compute rank-3 GCP approximation to X with GCP-OPT
result_lbfgs, initial_guess, info_lbfgs = ttb.gcp_opt(
    data=X, rank=rank, objective=objective, optimizer=optimizer, init=M_init
)

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =          540     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  1.40726D+06    |proj g|=  3.59929D+04

At iterate    1    f=  1.25854D+06    |proj g|=  2.45019D+04

At iterate    2    f=  1.05074D+06    |proj g|=  1.40448D+04

At iterate    3    f=  9.99440D+05    |proj g|=  1.18869D+04

 Bad direction in the line search;
   refresh the lbfgs memory and restart the iteration.
At iterate    4    f=  9.99440D+05    |proj g|=  1.18869D+04

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
  540      4     26    213     0   208   1.189D+04   9.994D+05
  F =   999440.06804895133     

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH

Matlab

>> R = 3;
>> X = import_data('fail_rayleigh.txt');
>> Minit = ktensor(@ones, size(X), R);
>> [M,~,out] = gcp_opt(X,R,'type','rayleigh','opt','lbfgsb','maxiters',10,'init',Minit,'printitn',1);

GCP-OPT-LBFGSB (Generalized CP Tensor Decomposition)

Tensor size: 50 x 60 x 70 (210000 total entries)
GCP rank: 3
Generalized function Type: rayleigh
Objective function: @(x,m)2*log(m+1e-10)+(pi/4)*(x./(m+1e-10)).^2
Gradient function: @(x,m)2./(m+1e-10)-(pi/2)*x.^2./(m+1e-10).^3
Lower bound of factor matrices: 0
Optimization method: lbfgsb
Max iterations: 10
Projected gradient tolerance: 21

Begin Main loop
Iter     1, f(x) = 1.199941e+06, ||grad||_infty = 1.87e+04
Iter     2, f(x) = 1.020794e+06, ||grad||_infty = 8.75e+03
Iter     3, f(x) = 9.407739e+05, ||grad||_infty = 4.51e+03
Iter     4, f(x) = 9.002865e+05, ||grad||_infty = 2.27e+03
Iter     5, f(x) = 8.767575e+05, ||grad||_infty = 1.11e+03
Iter     6, f(x) = 8.661887e+05, ||grad||_infty = 7.69e+02
Iter     7, f(x) = 8.631950e+05, ||grad||_infty = 2.68e+02
Iter     8, f(x) = 8.625062e+05, ||grad||_infty = 1.31e+02
Iter     9, f(x) = 8.622247e+05, ||grad||_infty = 7.83e+01
Iter    10, f(x) = 8.622089e+05, ||grad||_infty = 1.15e+02
End Main Loop

Final objective: 8.6221e+05
Setup time: 0.01 seconds
Main loop time: 0.08 seconds
Outer iterations: 10
Total iterations: 21
L-BFGS-B Exit message: UNRECOGNIZED EXIT FLAG
ntjohnson1 commented 1 year ago

Hmm do we know if we hit this on other objectives? I did a bit of digging today without much luck.

I checked:

ntjohnson1 commented 1 year ago

~I'm a little suspicious of our inplace update to model every time we return the function value, but won't have time until tonight/next week to confirm if that makes sense iteratively. (As opposed to init_model.copy().update where the entire solution is captured in the provided vector)~ We replace ALL values so this statement doesn't make sense

ntjohnson1 commented 1 year ago

Note Bad direction in the line search; refresh the lbfgs memory and restart the iteration. doesn't actually mean there is an error. For the working example we can update the optimizer for more line searches and the warning goes away optimizer = LBFGSB(maxiter=10, iprint=1, maxls=500). But it does appear that LBFGSB is converging potentially sub-optimally from a similar starting point. CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH since we have two steps nearly identical so the solve completes stating convergence.

ntjohnson1 commented 1 year ago

Ok, breaking it down smaller. Just doing a single step matches (I didn't bother actually writing the full gradient so assumed things continue to match).

X = ttb.import_data('fail_rayleigh.txt')
# Select Rayleigh objective
objective = Objectives.RAYLEIGH

# Select LBFGSB solver
optimizer = LBFGSB(maxiter=10, maxfun=100, iprint=1, factr=1e1, maxls=500)

shape = (50, 60, 70)
rank = 3
M_init = ttb.ktensor.from_function(np.ones, X.shape, rank)
# Compute rank-3 GCP approximation to X with GCP-OPT
function_handle, gradient_handle, lower_bound = setup(objective, X)
model = M_init.copy()
mask = None

# This is the function that gets handed to LBFGSB
def lbfgsb_func_grad(vector: np.ndarray):
   model.update(np.arange(model.ndims), vector)
   func_val, grads = evaluate(
      model,
      X,
      mask,
      function_handle,
      gradient_handle,
   )
        return func_val, ttb.ktensor(grads, copy=False).tovec(False)
    f, g = lbfgsb_func_grad(M_init.tovec(False))
print(f"Initial function value: {f}")
print(f"Initial g: {g[0:10]}")
R = 3;
X = import_data('fail_rayleigh.txt');
Minit = ktensor(@ones, size(X), R);
nd = ndims(Minit);
[fh,gh,lb_] = tt_gcp_fg_setup('rayleigh',X);
W = [];

% This is the function that gets handed to LBFGSB
fcn = @(x) tt_gcp_fg(update(Minit,1:nd,x), X, fh, gh, W, true, true, true);
[f, g] = fcn(tovec(Minit, false));
f
g(1:11)
ntjohnson1 commented 1 year ago

A few more notes before I time out looking into this today.

maxls argument for scipy not exposed to MATLAB implementation. On a quick skim it looks like it is mostly used here in the fortran and here in the MATLAB converted C. So it is basically hard coded to 20 which is the default in scipy which means the arguments/parameters options shouldn't be the delta.

If you update (in your full repro):

We can get a lot more verbose detail from the underlying lbfgsb call and see it matches at iter 0 (before doing anything) but diverges at a single step

optimizer = LBFGSB(maxiter=1, iprint=99, m=5) #m=5 is the default in MATLAB AFAICT

In MATLAB this can be accomplished by updating this to 99 before running (I don't know if it is plumbed through) from a top level argument anywhere. Update the gcp_opt call to only a single iteration.

ntjohnson1 commented 1 year ago

With iprint=99 we can see that at iterate 0 the function values match but the projected gradient norms DON'T. However, with my single step repro above we can take the infinity norm of the provided gradients. The single step gradients match, and the python version matches what lbfgsb prints out as the projected gradient norm. So maybe the way we are specifying bounds is off or there is some subtle bug in the MATLAB version?

On my computer I get

ntjohnson1 commented 1 year ago

Ok, I found the bug for the delta in the initial step. I had set normalize for factor 0 where MATLAB normalize is equivalent to all. https://github.com/ntjohnson1/pyttb/commit/f00542e9fd8365495b9a2a8647ba7415731a3a5b we still don't match on the next step so there must be a setting or implementation difference.

ntjohnson1 commented 1 year ago

As can be seen from the above messages this massively nerd sniped me. The first issue I resolved above around our initial conditions differing slightly. The second one is a bigger non-gcp specific bug. It looks like the Fortran ordering bites again. If I change this line to model = ttb.ktensor.from_vector(vector, initial_model.shape, False) (which should yield an equivalent result since we are replacing all entries) we then get similar behavior between MATLAB and python. So I think our Fortran vs C ordering assumptions are inconsistent somewhere.