Open vignesh-bungee opened 1 year ago
@vignesh-bungee i believe the first access is simply applying the sort, and the second one has it cached, but I'm not 100% certain.
Try doing the entire thing under a
with vaex.cache.off():
....
to see the results without cache
Oddly enough, I'm experiencing something different than you, wherein the sort
itself is taking some time, but then (because it's cached) both the bottom and top accessors are extremely fast.
I am using a newer version of vaex, but I don't think what I'm seeing is particularly expected..
@Ben-Epstein Since our dataset is very large and we perform many operations on it, including slicing and dicing, using vaex.cache.off() can lead to degraded performance. We have tried this on our Jupyter notebook but found that it yield similar results with or without the cache off. Column information
Hi Vaex Team,
We are experiencing an issue with the sort function in Vaex. Specifically, when we sort our dataset (shape: (13160951, 77)) by the estimaterevenue column (dtype: float64) in descending order, we observe a delay in slicing the lower offset, while slicing the higher offset is relatively fast.On the other hand, when we sort in ascending order the same column, we observe that slicing the lower offset is relatively fast, while slicing the higher offset is slow. I would appreciate it if you could look into this issue. It seems to be a bug, can you please confirm ?
Software information • Vaex version 4.14.0 • Vaex was installed via: pip • OS: Linux
Additional information
Jupyter notebook screenshot image attached