Open Ben-Epstein opened 1 year ago
Interesting. It seems that this is due to Arrow's .to_pylist()
. Can you see if you can reproduce this using arrow only? If so, this is an arrow performance issue.
@maartenbreddels yes, it's happening in arrow as well
When the column is a numpy array within vaex, it is fast
Maybe vaex can know if the column can be a numpy array, and do this automatically? I will also open an issue in pyarrow
Thank you for reaching out and helping us improve Vaex!
Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.
Description Please provide a clear and concise description of the problem. This should contain all the steps needed to reproduce the problem. A minimal code example that exposes the problem is very appreciated.
Software information
import vaex; vaex.__version__)
:{'vaex-core': '4.16.0', 'vaex-hdf5': '0.12.2'}
Additional information