hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.54k stars 313 forks source link

get_view_by_idx(slicing) with last element fails due to an assertion error in Debug mode. #165

Closed rhanai closed 2 years ago

rhanai commented 2 years ago

Environment

MSVC 2019, Debug build

Phenomenon

When the user specifies the last element of index in the second elemnt of Index2D<IndexType>, get_view_by_idx(slicing) causes an assertion error can't dereference out of range vector iterator. Release build, however, causes no error in real run. (But I felt some unsafety here.)

Reproduce

Here is an arranged version of dataframe_tester.cc/test_get_view_by_idx_slicing. Only three differences from the original is the second elements of Index2D in get_data_by_idx and get_view_by_idx, and the last assertion.

static void test_get_view_by_idx_slicing()  {

    std::cout << "\nTesting get_view_by_idx()/slicing ..." << std::endl;

    std::vector<unsigned long>  idx =
        { 123450, 123451, 123452, 123453, 123454, 123455, 123456,
          123457, 123458, 123459, 123460, 123461, 123462, 123466 };
    std::vector<double> d1 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 };
    std::vector<double> d2 = { 8, 9, 10, 11, 12, 13, 14, 20, 22, 23,
                               30, 31, 32, 1.89 };
    std::vector<double> d3 = { 15, 16, 17, 18, 19, 20, 21,
                               0.34, 1.56, 0.34, 2.3, 0.1, 0.89, 0.45 };
    std::vector<int>    i1 = { 22, 23, 24, 25, 99, 100, 101, 3, 2 };
    MyDataFrame         df;

    df.load_data(std::move(idx),
                 std::make_pair("col_1", d1),
                 std::make_pair("col_2", d2),
                 std::make_pair("col_3", d3),
                 std::make_pair("col_4", i1));

    typedef DataFrameView<unsigned long> MyDataFrameView;

    MyDataFrame     df2 =
        df.get_data_by_idx<double, int>(
            Index2D<MyDataFrame::IndexType> { 123452, 123466 });  // original: {123452, 123460}, succeeds
    MyDataFrameView dfv =
        df.get_view_by_idx<double, int>(
            Index2D<MyDataFrame::IndexType> { 123452, 123466 });  // original: {123452, 123460}, assertion error in debug mode

    df.write<std::ostream, double, int>(std::cout);
    df2.write<std::ostream, double, int>(std::cout);
    dfv.write<std::ostream, double, int>(std::cout);

    dfv.get_column<double>("col_3")[0] = 88.0;
    assert(dfv.get_column<double>("col_3")[0] ==
           df.get_column<double>("col_3")[2]);
    assert(dfv.get_column<double>("col_3")[0] == 88.0);
    assert(dfv.shape().first == 12);  // added
}

Process to reproduce

Run code above with assertion enabled. (In my case, Debug build in MSVC 2019)

Expected behavior

dfv returns a view with 12 rows without any assertion errors.

Current behavior:

Cause

The assertion occurs in the following code in DataFrame_get.tcc's line 366:

    if (lower != indices_.end())  {
        dfv.indices_ =
            typename DataFrameView<IndexType>::IndexVecType(&*lower, &*upper);

When upper==indices_.end(), derefereincing it by *upper is not allowed, I think. this link says

This element acts as a placeholder; attempting to access it results in undefined behavior.

Thanks in advance.

hosseinmoein commented 2 years ago

Thanks, I will take a look

hosseinmoein commented 2 years ago

@rhanai , I think MS compilers follow the rules more closely. I use gcc in MacOS and Ubuntu. Neither of them complained about this. But I fixed it in master, based on the specs. Please check it out

rhanai commented 2 years ago

@hosseinmoein I pulled commit f7b397f2 and checked the modified test and my real case, neither of them caused assertion error in MSVC 2019. Thank you for your quick response!