vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.27k stars 590 forks source link

[BUG-REPORT] `.unique` is much slower than `.to_numpy()` and taking a set #2371

Open Ben-Epstein opened 1 year ago

Ben-Epstein commented 1 year ago

Description It's much faster to bring data into numpy for a unique check than in vaex

image

Software information