korken89 / lbfgs-rs

Apache License 2.0
4 stars 2 forks source link

Updating LBFGS buffer #1

Closed alphaville closed 5 years ago

alphaville commented 5 years ago

Regarding function update_hessian, the LBFGS buffer should be updated only if (s,y) satisfies the condition in the paper of Li and Fukushima, otherwise, the pair should be discarded. This means that old_state and old_gradient shouldn't be updated.

That said, I think that the following should be moved below inside the if block

// TODO: Check if these should be inside the if statement
self.old_state.copy_from_slice(state);
self.old_gradient.copy_from_slice(gradient);

Additionally, since LBFGS can be used in applications other than gradient-based optimisation, we could rename gradient to something else. We could adopt the notation:

Often, g is the gradient of a cost function.

alphaville commented 5 years ago

Another related question: I see that in the following function:

fn new_s_and_y_valid(&self, gradient: &[f64]) -> bool {
    // TODO: Check if EPSILON should be changed
    const EPSILON: f64 = 1e-12;
    let sy = vec_ops::inner_product(&self.s.last().unwrap(), &self.y.last().unwrap())
            / vec_ops::inner_product(&self.y.last().unwrap(), &self.y.last().unwrap());

    // TODO: Check the alpha power to be used
    let ep = EPSILON * vec_ops::norm2(gradient);

    // Condition: (y^T * s) / ||s||^2 > epsilon * ||grad(x)||^alpha
    sy > ep && sy.is_finite() && ep.is_finite()
}

I see that the inner products are calculated on self.s.last() and self.y.last(), that is, on s and y variables which are already in the buffer. However, if the condition of Li and Fukushima is not satisfied, then these s and y should not be included in the buffer -- the buffer (that is, self.s and self.y) should, instead, not be updated.

korken89 commented 5 years ago

Thank you, the first one is fixed with 57a0238 !

On the second one, s.last() and y.last() are only temporary storage to calculate s and y from the inputs (this is why there is a +1 in the size of s and y), hence if the update is not valid they are not included and overwritten with the next input. Only when they are accepted, then they are rotated to the start of the s and y vectors to be used for estimating the hessian, see lines 146 - 152 in lib.rs:

        // Check that the s and y are valid to use
        if self.new_s_and_y_valid(g) {
            self.old_state.copy_from_slice(state);
            self.old_g.copy_from_slice(g);

            // Move the new s_0 and y_0 to the front
            self.s.rotate_right(1);
            self.y.rotate_right(1);
alphaville commented 5 years ago

I wrote the following test (I've changed the API a little, but you'll figure out what this does):

#[test]
    fn lbfgs_s_and_y() {
        let n = 2;
        let mem = 5;
        let mut lbfgs = Estimator::new(n, mem);
        lbfgs.with_alpha(0.0).with_epsilon(0.0);

        let mut g0 = [-0.28, 0.60];
        let x0 = [-0.65, 1.18];
        lbfgs.update_hessian(&g0, &x0);
        lbfgs.apply_hessian(&mut g0);

        let mut g1 = [1.7800, 1.7700];
        let x1 = [-0.7600, -1.1100];
        lbfgs.update_hessian(&g1, &x1);
        lbfgs.apply_hessian(&mut g1);

        assert_array_ae(&lbfgs.s[0], &[-0.1100, -2.2900], 1e-10);
        assert_array_ae(&lbfgs.y[0], &[2.0600, 1.1700], 1e-10);

        let mut g2 = [-1.8700, -1.0600];
        let x2 = [-0.8500, -0.5800];
        lbfgs.update_hessian(&g2, &x2);
        lbfgs.apply_hessian(&mut g2);

        println!("lbfgs = {:#?}", lbfgs);
        assert_array_ae(&lbfgs.s[0], &[-0.0900, 0.5300], 1e-10);
        assert_array_ae(&lbfgs.y[0], &[-3.6500, -2.8300], 1e-10);
        assert_array_ae(&lbfgs.s[1], &[-0.1100, -2.2900], 1e-10);
        assert_array_ae(&lbfgs.y[1], &[2.0600, 1.1700], 1e-10);
    }

Note: assert_array_ae is the following function used for testing:

    fn assert_array_ae(x: &[f64], y: &[f64], tol: f64) {
        x.iter()
            .zip(y.iter())
            .for_each(|(&xi, &yi)| assert!((xi - yi).abs() < tol));
    }

This prints the following:

lbfgs = Estimator {
    active_size: 3,
    gamma: -0.05491435161311496,
    s: [
        [
            -0.08999999999999997,
            0.5300000000000001
        ],
        [
            -0.10999999999999999,
            -2.29
        ],
        [
            -0.65,
            1.18
        ],
        [
            1.0,
            1.0
        ],
        [
            1.0,
            1.0
        ],
        [
            1.0,
            1.0
        ]
    ],
    y: [
        [
            -3.6500000000000004,
            -2.83
        ],
        [
            2.06,
            1.17
        ],
        [
            -0.28,
            0.6
        ],
        [
            0.0,
            0.0
        ],
        [
            0.0,
            0.0
        ],
        [
            0.0,
            0.0
        ]
    ],
    alpha: [
        0.3359228273860338,
        -0.11053784279012983,
        0.3304518607411522,
        0.0,
        0.0
    ],
    rho: [
        -0.8536793580331222,
        -0.34412746481296674,
        1.1235955056179776,
        0.0,
        0.0
    ],
    old_state: [
        -0.85,
        -0.58
    ],
    old_g: [
        -1.87,
        -1.06
    ],
    cbfgs_alpha: 0.0,
    cbfgs_epsilon: 0.0
}

Shouldn't active_size be equal to 2? Moreover, s and y seem to store a third value - in fact y[2] seems to be equal to g0 and s[2] is equal to x0. Is that necessary?

korken89 commented 5 years ago

Thanks for the comments, here are the answers:

active_size should be 3, when there are 3 active buffers (which there are), so this is fine. If you look at line 72-73, active_size is used non-inclusive.

s and y are storing the s and y under test by the C-BFGS criteria in the last element, and must be there if it is rejected, else we would overwrite old good gradients with the temporary, sadly it is necessary when having the C-BFGS criteria.

alphaville commented 5 years ago

I think we resolved all related issues, so I'll close it.