Closed YingHREN closed 3 months ago
There are a few ways of doing this:
If you have the whole data already in memory in one DataFrame, you can use slicing to get another DataFrame or another view (if you don't want a copy). In documentation look at get_[data|view]_by_...()
.
If you have the data in a file, you can read the file in chunks into multiple DataFrames. In documentation look at read()
.
Thanks, if I get the view, could I do something like get_databy on the view?
Yes
Another question is that I want to first do groupby and then do the unique_value_count based on the group result. I should write my own visitor?
template<typename T, typename I = unsigned long> struct Unique_Value_Visitor {
using value_type = T;
using index_type = I;
using size_type = std::size_t;
using result_type = std::size_t;
explicit Unique_Value_Visitor(bool skipnan = true) : skip_nan_(skipnan) {}
inline void operator()(const index_type&, const value_type& val) {
unique_values_.insert(val);
}
PASS_DATA_ONE_BY_ONE
inline void pre() {
result_ = result_type{};
unique_values_.clear();
}
inline void post() {}
inline std::size_t get_result() const { return 0; }
private:
resulttype result;
const bool skipnan;
std::unordered_set
I am not sure why you need a visitor. You can call the group-by and then call the unique column value on the result of group-by.
I do groupby on column "a", and groupby could store the column "b" value as a vector for every "a", How should I do that?
Read the group by documentation including its code samples
Thanks I will try something new on my side first.
I met a problem that I need to divide a big dataframe into several small ones while no memory copy needed