vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.25k stars 590 forks source link

Fix: histogram sometimes does not obey selections #2196

Closed JovanVeljanoski closed 1 year ago

JovanVeljanoski commented 2 years ago

This issue could in partly due to count something being buggy when handling arrow data that has missing values. Not sure tho, but there are tests implemented that cover these things in both sides for safety (asserting the histogram output and the count output).

Workaround: If somebody is struggling with this issue, there is a simple workaround. Say column x is arrow column with missing values. All you need to do is:

df['x'] = df.x.as_numpy()

and everything should work as expected.

Checklist:

maartenbreddels commented 1 year ago

Party 🎉

JovanVeljanoski commented 1 year ago

Party!! 🎉

Awesome, thank you!