Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
8.27k
stars
590
forks
source link
[BUG-REPORT] `.unique` is much slower than `.to_numpy()` and taking a set #2371
Open
Ben-Epstein opened 1 year ago
Description It's much faster to bring data into numpy for a
unique
check than in vaexSoftware information
Vaex version (
import vaex; vaex.__version__)
:{'vaex-core': '4.16.0', 'vaex-hdf5': '0.12.3'}
Vaex was installed via: pip / conda-forge / from source:
pip
OS: Mac