hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.53k stars 313 forks source link

VarVisitor isn't numerically stable #334

Closed adrian17 closed 3 weeks ago

adrian17 commented 1 month ago

Given example code:

    std::size_t   SIZE = 1000;
    auto data = gen_normal_dist<double, ALIGNMENT>(SIZE);

    //for (auto &value : data) {
    //    value += 100000000.0;
    //}

    MyDataFrame df;
    df.load_data(
        MyDataFrame::gen_sequence_index(0, SIZE, 1),
        std::make_pair("value1", data));

    VarVisitor<double, time_t>  ln_vv;
    df.visit<double>("value1", ln_vv);
    std::cout << ln_vv.get_result() << std::endl;

The result is correct - close to 1.0. Once I uncomment the commented lines, I expected the result to stay ~1.0, since the variance didn't change. Instead, the program shows results like

-8.2002
-41.001
10.2503
53.3013

This degree of error isn't present in most other statistical packages I've seen.

(I didn't test other functions, but I'm assuming they could also use an audit for correctness.)

hosseinmoein commented 1 month ago

I will take a look, thanks

hosseinmoein commented 4 weeks ago

This has been fixed in master branch