test cholesky_solve ... FAILED

termoshtt commented 7 years ago

Sometimes this test fails https://app.wercker.com/termoshtt/ndarray-linalg/runs/test-openblas/59f195128e888f0001cf3ef5?step=59f19584791109000146172c

termoshtt commented 6 years ago

Error in #98 https://app.wercker.com/termoshtt/ndarray-linalg/runs/test-openblas/5a576c74f591740001562d94?step=5a576c9dda96a800019a4d42

thread 'rcond' panicked at 'called `Result::unwrap()` on an `Err` value: 0.23086353780922303'

``` ---- rcond stdout ---- thread 'rcond' panicked at 'called `Result::unwrap()` on an `Err` value: 0.23086353780922303', /checkout/src/libcore/result.rs:906:4 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. stack backtrace: 0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49 1: std::sys_common::backtrace::_print at /checkout/src/libstd/sys_common/backtrace.rs:69 2: std::panicking::default_hook::{{closure}} at /checkout/src/libstd/sys_common/backtrace.rs:58 at /checkout/src/libstd/panicking.rs:381 3: std::panicking::default_hook at /checkout/src/libstd/panicking.rs:391 4: std::panicking::rust_panic_with_hook at /checkout/src/libstd/panicking.rs:577 5: std::panicking::begin_panic at /checkout/src/libstd/panicking.rs:538 6: std::panicking::begin_panic_fmt at /checkout/src/libstd/panicking.rs:522 7: rust_begin_unwind at /checkout/src/libstd/panicking.rs:498 8: core::panicking::panic_fmt at /checkout/src/libcore/panicking.rs:71 9: core::result::unwrap_failed at /checkout/src/libcore/macros.rs:23 10: >::unwrap at /checkout/src/libcore/result.rs:772 11: solve::rcond at tests/solve.rs:40 12: >::call_box at /checkout/src/libtest/lib.rs:1480 at /checkout/src/libcore/ops/function.rs:223 at /checkout/src/libtest/lib.rs:141 13: __rust_maybe_catch_panic at /checkout/src/libpanic_unwind/lib.rs:99 ```

davenza commented 6 years ago

If it helps, this is an instance problem that always fails. Copied from the cholesky test, but without randomness:


use ndarray::{arr1,arr2};
#[test]
fn cholesky_solve_fail() {

    let a: Array2<f64> = arr2(&[[2.7518138512052546, 0.0, 0.0],
                                    [-3.0030534432813334, 0.5503895871967794, 0.0],
                                    [0.0, 0.0, 0.4472135954999579]]);
    let x: Array1<f64> = arr1(&[-0.3633966736383765772089, -9.2503560214093027980198, 0.0]);
    let b = a.dot(&x);
    println!("a = \n{:#?}", a);
    println!("x = \n{:#?}", x);
    println!("b = \n{:#?}", b);
    println!("Solution = \n{:#?}", &a.solvec(&b).unwrap());
    assert_close_l2!(&a.solvec(&b).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.solvec_into(b.clone()).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.solvec_inplace(&mut b.clone()).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.factorizec(UPLO::Upper).unwrap().solvec(&b).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.factorizec(UPLO::Lower).unwrap().solvec(&b).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.factorizec(UPLO::Upper).unwrap().solvec_into(b.clone()).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.factorizec(UPLO::Lower).unwrap().solvec_into(b.clone()).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.factorizec(UPLO::Upper).unwrap().solvec_inplace(&mut b.clone()).unwrap(), &x, 1e-9);
    assert_close_l2!(&a.factorizec(UPLO::Lower).unwrap().solvec_inplace(&mut b.clone()).unwrap(), &x, 1e-9);
}

Output:

---- cholesky_solve_fail stdout ----
    a = 
[[2.7518138512052546, 0.0, 0.0],
 [-3.0030534432813334, 0.5503895871967794, 0.0],
 [0.0, 0.0, 0.4472135954999579]] shape=[3, 3], strides=[3, 1], layout=C (0x1)
x = 
[-0.3633966736383766, -9.250356021409303, 0.0] shape=[3], strides=[1], layout=C | F (0x3)
b = 
[-1.0, -4.0, 0.0] shape=[3], strides=[1], layout=C | F (0x3)
Solution = 
[-0.36339667363837663, -7.26757935296819, 0.0] shape=[3], strides=[1], layout=C | F (0x3)
thread 'cholesky_solve_fail' panicked at 'called `Result::unwrap()` on an `Err` value: 0.21418077766308938', libcore/result.rs:945:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

First (and third) component are OK though.

jturner314 commented 6 years ago

@davenza Thanks for taking a look at this. This particular example fails because a is asymmetric. (Solving with Cholesky decomposition requires a Hermitian (or real symmetric) positive definite coefficient matrix.) Did you get that a matrix from running the cholesky_solve test (which calls random_hpd to generate a random Hermitian positive-definite matrix), or did you create it yourself? In other words, is random_hpd the real problem here?

This does bring up another question: In this particular example, a bad value is returned since a is asymmetric. Should we explicitly check if a is Hermitian and positive definite, and if not return an Err?

davenza commented 6 years ago

@jturner314 Thanks, I didn't notice the symmetric requirement. I was trying to do a different thing, and I think I misunderstood the documentation.

I had a covariance matrix (not created with random_hpd), let's call U:

[7.572479471685095, -8.263844061131207, 0.0] [-8.263844061131206, 9.321258680898515, 0.0] [0.0, 0.0, 0.2]

I found the lower cholesky decomposition U = LL^T. L is the matrix that I used in the code above:

[2.7518138512052546, 0.0, 0.0] [-3.0030534432813334, 0.5503895871967794, 0.0] [0.0, 0.0, 0.4472135954999579]

Then, I tried to solve Lx = [1 4 0] using solvec.

However, related with the problem: while I was doing some cargo test runs, I sometimes had errors. I could try to reproduce the errors again, printing a failing matrix if it is interesting to you.

jturner314 commented 6 years ago

Fwiw, if you need both the triangular factor (L matrix) and a solution (with .solvec()), you can factor once with .factorizec() or .factorizec_into() to get a CholeskyFactorized instance. You can then solve, take the inverse, take the determinant, get the lower/upper triangular factor, etc., using the CholeskyFactorized instance without having to factor the matrix multiple times.

Regarding the test failures: I think they're due to floating-point errors. (In the cases I've observed the tests failing, the errors have always been pretty small. If you see a case where the error is large, please post it here.) One idea to minimize issues associated with floating-point errors in the tests is to throw out randomly-generated matrices that are ill-conditioned (and generate new ones that are well-conditioned), but I haven't had a chance to try this.

davenza commented 6 years ago

In two of my executions of the cholesky_solve() test, the test failed with these matrices:

Fail 1

---- cholesky_solve stdout ---- a = [[0.33597532169190203, -0.8317558045767911, -0.33804679493344636], [-0.8317558045767911, 7.026626194373293, 5.091368342750945], [-0.33804679493344636, 5.091368342750945, 6.387675256291089]] shape=[3, 3], strides=[3, 1], layout=C (0x1) x = [-1.3640677969149184, 1.4365643322080148, -0.2864128482272794] shape=[3], strides=[1], layout=C | F (0x3) a = [[3.147686, -0.6140107, -1.1712251], [-0.6140107, 3.554442, -0.27945665], [-1.1712251, -0.27945665, 0.51092684]] shape=[3, 3], strides=[3, 1], layout=C (0x1) x = [0.2852047, -0.2701597, -0.70091474] shape=[3], strides=[1], layout=C | F (0x3) thread 'cholesky_solve' panicked at 'called `Result::unwrap()` on an `Err` value: 0.0027820447', libcore/result.rs:945:5``

Fail 2 (with backtrace)

---- cholesky_solve stdout ---- a = [[1.2219888536170944, -1.3044421701103923, 1.0844884489891133], [-1.3044421701103923, 1.622040998971165, -0.25782460724863565], [1.0844884489891133, -0.25782460724863565, 4.503214907761472]] shape=[3, 3], strides=[3, 1], layout=C (0x1) x = [2.3896768802521486, -0.5822467373409934, 1.1002035853153536] shape=[3], strides=[1], layout=C | F (0x3) a = [[5.918905, 4.401211, 1.5132039], [4.401211, 4.052539, -0.07915869], [1.5132039, -0.07915869, 2.2470446]] shape=[3, 3], strides=[3, 1], layout=C (0x1) x = [-0.01866009, -0.6332641, -0.04560638] shape=[3], strides=[1], layout=C | F (0x3) thread 'cholesky_solve' panicked at 'called `Result::unwrap()` on an `Err` value: 0.00379905', libcore/result.rs:945:5 note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace. stack backtrace: 0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace at libstd/sys/unix/backtrace/tracing/gcc_s.rs:49 1: std::sys_common::backtrace::_print at libstd/sys_common/backtrace.rs:71 2: std::panicking::default_hook::{{closure}} at libstd/sys_common/backtrace.rs:59 at libstd/panicking.rs:380 3: std::panicking::default_hook at libstd/panicking.rs:390 4: std::panicking::rust_panic_with_hook at libstd/panicking.rs:576 5: std::panicking::begin_panic at libstd/panicking.rs:537 6: std::panicking::begin_panic_fmt at libstd/panicking.rs:521 7: rust_begin_unwind at libstd/panicking.rs:497 8: core::panicking::panic_fmt at libcore/panicking.rs:71 9: core::result::unwrap_failed at /checkout/src/libcore/macros.rs:23 10: >::unwrap at /checkout/src/libcore/result.rs:782 11: cholesky::cholesky_solve at tests/cholesky.rs:146 12: >::call_box at libtest/lib.rs:1462 at /checkout/src/libcore/ops/function.rs:223 at /checkout/src/liballoc/boxed.rs:788 13: __rust_maybe_catch_panic at libpanic_unwind/lib.rs:102

It seems that both failures happen with the f32 type and due to floating-point errors, as you said.

termoshtt commented 6 years ago

One idea to minimize issues associated with floating-point errors in the tests is to throw out randomly-generated matrices that are ill-conditioned

It will be a right way. Since the implementations of LAPACK are trusted, we do not have to check the singular cases.

rust-ndarray / ndarray-linalg

test cholesky_solve ... FAILED #91