hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.54k stars 313 forks source link

Why call std::vector<T>::reserve() every time in append() #178

Closed wujinghe closed 2 years ago

wujinghe commented 2 years ago

If I append more than 100,000 rows, the program will run slowly. After profiling, the problem is in append() method which call reserve() every time. When I comment out reserve() it looks fine.

So I want to check if there are other considerations? Maybe considering calling reserve() to limit allocating more memory?

hosseinmoein commented 2 years ago

I personally don't use append that often, so there might something I overlooked. I will take a look when I get a chance

hosseinmoein commented 2 years ago

@wujinghe ,

Reserve was indeed unnecessary in append_column. I removed it in master FYI, the append operation, especially if done very frequently, is inefficient by its nature. But I also understand that sometimes your data pattern leaves you no other choice.

wujinghe commented 2 years ago

In my scenario, my program will receive messages in real time and insert into DataFrame, so I need to call append() for each message. Or you think this scenario is not suitable to use DataFrame.

I guess you usually call load() in your program.

hosseinmoein commented 2 years ago

It is definitely suitable. But in any system some operations are less efficient than others. By removing reserve, it should become a lot more efficient. Before it was moving the data on each call

wujinghe commented 2 years ago

Ok, thanks a lot for your reply.