hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.44k stars 310 forks source link

do get_data_by_isel on the view #294

Closed YingHREN closed 5 months ago

YingHREN commented 5 months ago

I wrote the code like this as folow:

auto res1_part = res1.get_view_by_loc<double, int>(hmdf::Index2D<long> {start, end});
auto functor =
    [&filter_cs_orders_set](const unsigned long &, const int &val)-> bool {
        return filter_cs_orders_set.count(val);
    };
auto df_res =
    res1_part.get_data_by_sel<int, decltype(functor), double, int>("a", functor);

it has error like this:

/usr/local/include/DataFrame/Internals/DataFrame_get.tcc: In instantiation of ‘hmdf::DataFrame<I, H> hmdf::DataFrame<I, H>::get_data_by_sel(const char*, F&) const [with T = int; F = main()::<lambda()>::<lambda(const long unsigned int&, const int&)>; Ts = {double, int}; I = long unsigned int; H = hmdf::HeteroView<0>]’:
/home/ubuntu/Project/*.cpp :226:79:   required from here
/usr/local/include/DataFrame/Internals/DataFrame_get.tcc:876:19: error: ‘using IndexVecType = using type = class hmdf::VectorView<long unsigned int, 0>’ {aka ‘class hmdf::VectorView<long unsigned int, 0>’} has no member named ‘push_back’
  876 |         new_index.push_back(indices_[citer]);
      |         ~~~~~~~~~~^~~~~~~~~

first could the isel operation work on the view? .

hosseinmoein commented 5 months ago

Yes, currently you cannot do a select on a View (you can do it on a PtrView). It is in my todo list to fix

hosseinmoein commented 5 months ago

@YingHREN , This was easier to fix than I thought. So it is fixed in master

YingHREN commented 5 months ago

Thanks I will try it

YingHREN commented 5 months ago

And does get_data_by_sel support multi thread if I do something like this

threads2.emplace_back([log, i, &res1,  start, end]() {
            auto res1_part = res1.get_data_by_loc<double, int>(hmdf::Index2D<long> {start, end});
        }) 

it may cause some error sometimes

hosseinmoein commented 5 months ago

You should definitely read the multithreading section in documentation. DataFrame, in general, is not multithreaded safe.

YingHREN commented 5 months ago

Ok thanks.