vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.25k stars 590 forks source link

Statistics on N-d grids--Calculation speed weird #2168

Closed lkcao closed 2 years ago

lkcao commented 2 years ago

Hi there, I have a large dataset of about 20 million samples. The task is simple: I want to get sum of a column. The weird thing is like this: when I run codes below, the time estimate is 19 s. 1660456076379 but when I run sum command individually, it takes about 1s. 1660456146677 But, the steps before sum calculation takes only 0.3 s.....so I don't know what happens/what makes the difference between speed in the two figures. Thoughts? solutions?