Closed wujinghe closed 2 years ago
I can add get_data_by_sel
and get_view_by_sel
for more columns in the next few days.
Being able to assign a view to a DataFrame is not simple, a bit tricky. But there are way to get around it. For example, if you ultimately need a DataFrame why do you get a view in the first place?
I use pipeline to process data. Maybe I filter in stage 1, calculate in stage 2, and response the result to client in stage 3. Every stage runs in different thread. New data will be inserted to the DataFrame at the same time, and the memory of vector
which is the container of DataFrame may reallocate, so I can't use view. On the other hand, I don’t want to use locks.
@wujinghe , I just added get_data_by_sel()
and get_view_by_sel()
for up to 5 columns in master
Thank you so much.
At the same time, I use std::tuple
to support more columns. The following is the test code, you can check the interfaces.
#define DECL_COL(COLUMN_NAME, C_TYPE) \
struct COLUMN_NAME \
{ \
constexpr static const char* name = {#COLUMN_NAME}; \
using type = C_TYPE; \
}; \
struct MyDf
{
DECL_COL(Col0, std::string);
DECL_COL(Col1, int32_t);
DECL_COL(Col2, int64_t);
DECL_COL(Col3, double);
DECL_COL(Col4, std::string);
};
void TestSchema()
{
// Prepare initial data
std::vector<unsigned long> col_idx = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
std::vector<MyDf::Col0::type> col0 =
{"01", "02", "03", "04", "05", "06", "07", "08", "09", "10"};
std::vector<MyDf::Col1::type> col1 = {1, 1, 2, 2, 3, 4, 5, 5, 5, 6};
std::vector<MyDf::Col2::type> col2 =
{11, 11, 21, 21, 31, 41, 51, 51, 51, 61};
std::vector<MyDf::Col3::type> col3 =
{0.11, 0.12, 0.21, 0.22, 0.30, 0.40, 0.51, 0.52, 0.53, 0.60};
std::vector<MyDf::Col4::type> col4 =
{"11", "11", "21", "21", "31", "41", "51", "51", "511", "61"};
// Construct df
hmdf::StdDataFrame<unsigned long> df;
df.load_index(std::move(col_idx));
df.load_column<MyDf::Col0::type>(MyDf::Col0::name, std::move(col0));
df.load_column<MyDf::Col1::type>(MyDf::Col1::name, std::move(col1));
df.load_column<MyDf::Col2::type>(MyDf::Col2::name, std::move(col2));
df.load_column<MyDf::Col3::type>(MyDf::Col3::name, std::move(col3));
df.load_column<MyDf::Col4::type>(MyDf::Col4::name, std::move(col4));
// Output the result
df.write<std::ostream,
MyDf::Col0::type,
MyDf::Col1::type,
MyDf::Col2::type,
MyDf::Col3::type,
MyDf::Col4::type>(std::cout, hmdf::io_format::csv2);
std::cout << "=============================" << std::endl;
// Test to get data by filter
auto filter_functor = [](const unsigned long& index,
const std::tuple<
MyDf::Col1::type,
MyDf::Col2::type,
MyDf::Col3::type,
MyDf::Col4::type>& t)-> bool {
return std::get<0>(t) > 2 and
std::get<1>(t) > 31 and
std::get<2>(t) > 0.5 and
std::get<3>(t) != std::string("511");
};
std::tuple cols_for_filter {
MyDf::Col1(), MyDf::Col2(), MyDf::Col3(), MyDf::Col4()};
auto df_result = df.get_data_by_sel<
decltype(cols_for_filter), decltype(filter_functor),
MyDf::Col0::type,
MyDf::Col1::type,
MyDf::Col2::type,
MyDf::Col3::type,
MyDf::Col4::type>(cols_for_filter, filter_functor);
// Output the result
df_result.write<std::ostream,
MyDf::Col0::type,
MyDf::Col1::type,
MyDf::Col2::type,
MyDf::Col3::type,
MyDf::Col4::type>(std::cout, hmdf::io_format::csv2);
}
Do you think it is ok, or any suggestions?
Looks good to me
How to
get_data_by_sel
by more than three columns? Becauseget_data_by_sel
just supports to pass three columns as arguments.I want to call
get_view_by_sel
more time to implement my requirment and changeview
to DataFrame. But I foundview
can't be converted to DataFrame directly.