timeflux / timeflux_rasr

Implementation of rASR filtering.
MIT License
26 stars 4 forks source link

[pymanopt] incorrect results of solver in nonlinear_eigh #14

Open lkorczowski opened 4 years ago

lkorczowski commented 4 years ago

EDIT: you can contribute directly to the pymanopt branch, see PR #18


I couldn't yet figure out what the problem, while nonlinear_eigh cost, gradien and hessian seems correct, the solver gives pretty random output with the most simple test (identity matrix).

I believe I need to discuss with @sylvchev to see where could be my mistake.

Do you can have time for a call this week ?

Do see the error, please check /test/test_pymanopt.py in https://github.com/bertrandlalo/timeflux_rasr/tree/pymanopt

Output:

============================= test session starts ==============================
platform darwin -- Python 3.7.5, pytest-5.2.2, py-1.8.1, pluggy-0.13.1 -- /Users/louis/anaconda3/envs/timeflux_rasr-env/bin/python
cachedir: .pytest_cache
rootdir: /Volumes/Ext/git/timeflux_rasr/test
collecting ... collected 1 item

test_pymanopt.py::test_nonlinear_eigh_unitary_n FAILED                   [100%]Compiling cost function...
Optimizing...
                                            f: +3.750000e+00   |grad|: 1.033499e-15

test_pymanopt.py:10 (test_nonlinear_eigh_unitary_n)
def test_nonlinear_eigh_unitary_n():
        n = 5
        X = np.eye(n)
>       Xfilt = nonlinear_eigh(X, n)

test_pymanopt.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../utils/pymanopt.py:293: in nonlinear_eigh
    Xopt = solver.solve(problem)
/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:172: in solve
    mininner, maxinner)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pymanopt.solvers.trust_regions.TrustRegions object at 0x12ca769d0>
problem = <pymanopt.core.problem.Problem object at 0x12ca76810>
x = array([[-0.42903078,  0.78880924, -0.10906021,  0.18925762,  0.38209945],
       [-0.43779034, -0.31149183,  0.7511357...17 ,  0.18654313, -0.39486134, -0.73921347],
       [ 0.09303512,  0.05132523, -0.1650446 , -0.89713262,  0.39575691]])
fgradx = array([[-1.11022302e-16,  2.22044605e-16,  1.66533454e-16,
        -5.55111512e-17,  0.00000000e+00],
       [ 1.11022...0.00000000e+00],
       [-2.77555756e-17,  1.38777878e-17,  5.55111512e-17,
        -4.44089210e-16,  1.11022302e-16]])
eta = array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
Delta = 0.2795084971874737, theta = 1.0, kappa = 0.1, mininner = 1, maxinner = 0

    def _truncated_conjugate_gradient(self, problem, x, fgradx, eta, Delta,
                                      theta, kappa, mininner, maxinner):
        man = problem.manifold
        inner = man.inner
        hess = problem.hess
        precon = problem.precon

        if not self.use_rand:  # and therefore, eta == 0
            Heta = man.zerovec(x)
            r = fgradx
            e_Pe = 0
        else:  # and therefore, no preconditioner
            # eta (presumably) ~= 0 was provided by the caller.
            Heta = hess(x, eta)
            r = fgradx + Heta
            e_Pe = inner(x, eta, eta)

        r_r = inner(x, r, r)
        norm_r = np.sqrt(r_r)
        norm_r0 = norm_r

        # Precondition the residual
        if not self.use_rand:
            z = precon(x, r)
        else:
            z = r

        # Compute z'*r
        z_r = inner(x, z, r)
        d_Pd = z_r

        # Initial search direction
        delta = -z
        if not self.use_rand:
            e_Pd = 0
        else:
            e_Pd = inner(x, eta, delta)

        # If the Hessian or a linear Hessian approximation is in use, it is
        # theoretically guaranteed that the model value decreases strictly with
        # each iteration of tCG. Hence, there is no need to monitor the model
        # value. But, when a nonlinear Hessian approximation is used (such as
        # the built-in finite-difference approximation for example), the model
        # may increase. It is then important to terminate the tCG iterations
        # and return the previous (the best-so-far) iterate. The variable below
        # will hold the model value.

        def model_fun(eta, Heta):
            return inner(x, eta, fgradx) + 0.5 * inner(x, eta, Heta)
        if not self.use_rand:
            model_value = 0
        else:
            model_value = model_fun(eta, Heta)

        # Pre-assume termination because j == end.
        stop_tCG = self.MAX_INNER_ITER

        # Begin inner/tCG loop.
        for j in xrange(0, int(maxinner)):
            # This call is the computationally intensive step
            Hdelta = hess(x, delta)

            # Compute curvature (often called kappa)
            d_Hd = inner(x, delta, Hdelta)

            # Note that if d_Hd == 0, we will exit at the next "if" anyway.
            alpha = z_r / d_Hd
            # <neweta,neweta>_P =
            # <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
            e_Pe_new = e_Pe + 2 * alpha * e_Pd + alpha ** 2 * d_Pd

            # Check against negative curvature and trust-region radius
            # violation. If either condition triggers, we bail out.
            if d_Hd <= 0 or e_Pe_new >= Delta**2:
                # want
                #  ee = <eta,eta>_prec,x
                #  ed = <eta,delta>_prec,x
                #  dd = <delta,delta>_prec,x
                tau = ((-e_Pd +
                        np.sqrt(e_Pd * e_Pd +
                                d_Pd * (Delta ** 2 - e_Pe))) / d_Pd)

                eta = eta + tau * delta

                # If only a nonlinear Hessian approximation is available, this
                # is only approximately correct, but saves an additional
                # Hessian call.
                Heta = Heta + tau * Hdelta

                # Technically, we may want to verify that this new eta is
                # indeed better than the previous eta before returning it (this
                # is always the case if the Hessian approximation is linear,
                # but I am unsure whether it is the case or not for nonlinear
                # approximations.) At any rate, the impact should be limited,
                # so in the interest of code conciseness (if we can still hope
                # for that), we omit this.

                if d_Hd <= 0:
                    stop_tCG = self.NEGATIVE_CURVATURE
                else:
                    stop_tCG = self.EXCEEDED_TR
                break

            # No negative curvature and eta_prop inside TR: accept it.
            e_Pe = e_Pe_new
            new_eta = eta + alpha * delta

            # If only a nonlinear Hessian approximation is available, this is
            # only approximately correct, but saves an additional Hessian call.
            new_Heta = Heta + alpha * Hdelta

            # Verify that the model cost decreased in going from eta to
            # new_eta. If it did not (which can only occur if the Hessian
            # approximation is nonlinear or because of numerical errors), then
            # we return the previous eta (which necessarily is the best reached
            # so far, according to the model cost). Otherwise, we accept the
            # new eta and go on.
            new_model_value = model_fun(new_eta, new_Heta)
            if new_model_value >= model_value:
                stop_tCG = self.MODEL_INCREASED
                break

            eta = new_eta
            Heta = new_Heta
            model_value = new_model_value

            # Update the residual.
            r = r + alpha * Hdelta

            # Compute new norm of r.
            r_r = inner(x, r, r)
            norm_r = np.sqrt(r_r)

            # Check kappa/theta stopping criterion.
            # Note that it is somewhat arbitrary whether to check this stopping
            # criterion on the r's (the gradients) or on the z's (the
            # preconditioned gradients). [CGT2000], page 206, mentions both as
            # acceptable criteria.
            if (j >= mininner and
               norm_r <= norm_r0 * min(norm_r0**theta, kappa)):
                # Residual is small enough to quit
                if kappa < norm_r0 ** theta:
                    stop_tCG = self.REACHED_TARGET_LINEAR
                else:
                    stop_tCG = self.REACHED_TARGET_SUPERLINEAR
                break

            # Precondition the residual.
            if not self.use_rand:
                z = precon(x, r)
            else:
                z = r

            # Save the old z'*r.
            zold_rold = z_r
            # Compute new z'*r.
            z_r = inner(x, z, r)

            # Compute new search direction
            beta = z_r / zold_rold
            delta = -z + beta * delta

            # Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
            e_Pd = beta * (e_Pd + alpha * d_Pd)
            d_Pd = z_r + beta * beta * d_Pd

>       return eta, Heta, j, stop_tCG
E       UnboundLocalError: local variable 'j' referenced before assignment

/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:552: UnboundLocalError

=================================== FAILURES ===================================
________________________ test_nonlinear_eigh_unitary_n _________________________

    def test_nonlinear_eigh_unitary_n():
        n = 5
        X = np.eye(n)
>       Xfilt = nonlinear_eigh(X, n)

test_pymanopt.py:14: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../utils/pymanopt.py:293: in nonlinear_eigh
    Xopt = solver.solve(problem)
/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:172: in solve
    mininner, maxinner)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pymanopt.solvers.trust_regions.TrustRegions object at 0x12ca769d0>
problem = <pymanopt.core.problem.Problem object at 0x12ca76810>
x = array([[-0.42903078,  0.78880924, -0.10906021,  0.18925762,  0.38209945],
       [-0.43779034, -0.31149183,  0.7511357...17 ,  0.18654313, -0.39486134, -0.73921347],
       [ 0.09303512,  0.05132523, -0.1650446 , -0.89713262,  0.39575691]])
fgradx = array([[-1.11022302e-16,  2.22044605e-16,  1.66533454e-16,
        -5.55111512e-17,  0.00000000e+00],
       [ 1.11022...0.00000000e+00],
       [-2.77555756e-17,  1.38777878e-17,  5.55111512e-17,
        -4.44089210e-16,  1.11022302e-16]])
eta = array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])
Delta = 0.2795084971874737, theta = 1.0, kappa = 0.1, mininner = 1, maxinner = 0

    def _truncated_conjugate_gradient(self, problem, x, fgradx, eta, Delta,
                                      theta, kappa, mininner, maxinner):
        man = problem.manifold
        inner = man.inner
        hess = problem.hess
        precon = problem.precon

        if not self.use_rand:  # and therefore, eta == 0
            Heta = man.zerovec(x)
            r = fgradx
            e_Pe = 0
        else:  # and therefore, no preconditioner
            # eta (presumably) ~= 0 was provided by the caller.
            Heta = hess(x, eta)
            r = fgradx + Heta
            e_Pe = inner(x, eta, eta)

        r_r = inner(x, r, r)
        norm_r = np.sqrt(r_r)
        norm_r0 = norm_r

        # Precondition the residual
        if not self.use_rand:
            z = precon(x, r)
        else:
            z = r

        # Compute z'*r
        z_r = inner(x, z, r)
        d_Pd = z_r

        # Initial search direction
        delta = -z
        if not self.use_rand:
            e_Pd = 0
        else:
            e_Pd = inner(x, eta, delta)

        # If the Hessian or a linear Hessian approximation is in use, it is
        # theoretically guaranteed that the model value decreases strictly with
        # each iteration of tCG. Hence, there is no need to monitor the model
        # value. But, when a nonlinear Hessian approximation is used (such as
        # the built-in finite-difference approximation for example), the model
        # may increase. It is then important to terminate the tCG iterations
        # and return the previous (the best-so-far) iterate. The variable below
        # will hold the model value.

        def model_fun(eta, Heta):
            return inner(x, eta, fgradx) + 0.5 * inner(x, eta, Heta)
        if not self.use_rand:
            model_value = 0
        else:
            model_value = model_fun(eta, Heta)

        # Pre-assume termination because j == end.
        stop_tCG = self.MAX_INNER_ITER

        # Begin inner/tCG loop.
        for j in xrange(0, int(maxinner)):
            # This call is the computationally intensive step
            Hdelta = hess(x, delta)

            # Compute curvature (often called kappa)
            d_Hd = inner(x, delta, Hdelta)

            # Note that if d_Hd == 0, we will exit at the next "if" anyway.
            alpha = z_r / d_Hd
            # <neweta,neweta>_P =
            # <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
            e_Pe_new = e_Pe + 2 * alpha * e_Pd + alpha ** 2 * d_Pd

            # Check against negative curvature and trust-region radius
            # violation. If either condition triggers, we bail out.
            if d_Hd <= 0 or e_Pe_new >= Delta**2:
                # want
                #  ee = <eta,eta>_prec,x
                #  ed = <eta,delta>_prec,x
                #  dd = <delta,delta>_prec,x
                tau = ((-e_Pd +
                        np.sqrt(e_Pd * e_Pd +
                                d_Pd * (Delta ** 2 - e_Pe))) / d_Pd)

                eta = eta + tau * delta

                # If only a nonlinear Hessian approximation is available, this
                # is only approximately correct, but saves an additional
                # Hessian call.
                Heta = Heta + tau * Hdelta

                # Technically, we may want to verify that this new eta is
                # indeed better than the previous eta before returning it (this
                # is always the case if the Hessian approximation is linear,
                # but I am unsure whether it is the case or not for nonlinear
                # approximations.) At any rate, the impact should be limited,
                # so in the interest of code conciseness (if we can still hope
                # for that), we omit this.

                if d_Hd <= 0:
                    stop_tCG = self.NEGATIVE_CURVATURE
                else:
                    stop_tCG = self.EXCEEDED_TR
                break

            # No negative curvature and eta_prop inside TR: accept it.
            e_Pe = e_Pe_new
            new_eta = eta + alpha * delta

            # If only a nonlinear Hessian approximation is available, this is
            # only approximately correct, but saves an additional Hessian call.
            new_Heta = Heta + alpha * Hdelta

            # Verify that the model cost decreased in going from eta to
            # new_eta. If it did not (which can only occur if the Hessian
            # approximation is nonlinear or because of numerical errors), then
            # we return the previous eta (which necessarily is the best reached
            # so far, according to the model cost). Otherwise, we accept the
            # new eta and go on.
            new_model_value = model_fun(new_eta, new_Heta)
            if new_model_value >= model_value:
                stop_tCG = self.MODEL_INCREASED
                break

            eta = new_eta
            Heta = new_Heta
            model_value = new_model_value

            # Update the residual.
            r = r + alpha * Hdelta

            # Compute new norm of r.
            r_r = inner(x, r, r)
            norm_r = np.sqrt(r_r)

            # Check kappa/theta stopping criterion.
            # Note that it is somewhat arbitrary whether to check this stopping
            # criterion on the r's (the gradients) or on the z's (the
            # preconditioned gradients). [CGT2000], page 206, mentions both as
            # acceptable criteria.
            if (j >= mininner and
               norm_r <= norm_r0 * min(norm_r0**theta, kappa)):
                # Residual is small enough to quit
                if kappa < norm_r0 ** theta:
                    stop_tCG = self.REACHED_TARGET_LINEAR
                else:
                    stop_tCG = self.REACHED_TARGET_SUPERLINEAR
                break

            # Precondition the residual.
            if not self.use_rand:
                z = precon(x, r)
            else:
                z = r

            # Save the old z'*r.
            zold_rold = z_r
            # Compute new z'*r.
            z_r = inner(x, z, r)

            # Compute new search direction
            beta = z_r / zold_rold
            delta = -z + beta * delta

            # Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
            e_Pd = beta * (e_Pd + alpha * d_Pd)
            d_Pd = z_r + beta * beta * d_Pd

>       return eta, Heta, j, stop_tCG
E       UnboundLocalError: local variable 'j' referenced before assignment

/Users/louis/anaconda3/envs/timeflux_rasr-env/lib/python3.7/site-packages/pymanopt/solvers/trust_regions.py:552: UnboundLocalError
----------------------------- Captured stdout call -----------------------------
Compiling cost function...
Optimizing...
                                            f: +3.750000e+00   |grad|: 1.033499e-15
=============================== warnings summary ===============================
/Volumes/Ext/git/timeflux_rasr/test/../utils/pymanopt.py:253
  /Volumes/Ext/git/timeflux_rasr/test/../utils/pymanopt.py:253: DeprecationWarning: invalid escape sequence \(
    """

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======================== 1 failed, 1 warnings in 2.78s =========================
lkorczowski commented 4 years ago

@sylvchev : could you help me on that ?

Please review the code on the branch pymanopt (https://github.com/bertrandlalo/timeflux_rasr/tree/pymanopt), specifically:

Thanks

lkorczowski commented 4 years ago

@sylvchev seems to have solve the problem: it was about iteration.

NOTE: We need to add the requirements of specific version pymanopt.