vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 590 forks source link

[BUG-REPORT] Using vaex without restarting the kernel breaks progress bar #2282

Open Ben-Epstein opened 1 year ago

Ben-Epstein commented 1 year ago

Thank you for reaching out and helping us improve Vaex!

Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.

Description I looked through the progress bar code and Registry, and don't quite understand how things are actually being registered into the registry

Note: This only happens the first time you install vaex in a jupyter notebook. After restarting, this works as expected - This makes me more confused because i'm not understanding how this flows fully.

In the cell of a notebook without vaex installed, run:

!pip install vaex-core

import vaex

df = vaex.from_arrays(id=list(range(100)))
with vaex.progress.tree("vaex", title="test"):
    df.export("file.arrow")

You get

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[85], line 3
      1 # _progressbar_registry["vaex"] = partial(simple,title="logging data")
----> 3 vaex.progress.bar("vaex")
      4 # del _progressbar_registry.registry["vaex"]

File .venv/lib/python3.8/site-packages/vaex/progress.py:181, in bar(type_name, title, max_value)
    179     if type_name is None:
    180         type_name = vaex.settings.main.progress.type
--> 181 return _progressbar_registry[type_name](title=title)

File .venv/lib/python3.8/site-packages/vaex/utils.py:75, in RegistryCallable.__getitem__(self, name)
     72         self.registry[entry.name] = entry.load()
     74 if name not in self.registry:
---> 75     raise NameError(f'No {self.typename} registered with name {name!r} under entry_point {self.entry_points!r}')
     76 return self.registry[name]

NameError: No progressbar registered with name 'vaex' under entry_point 'vaex.progressbar'

if you look at the registry, you see nothing in there

from vaex.progress import _progressbar_registry

_progressbar_registry.registry  # {}

If you do a full kernel restart, then you see something different

import vaex
from vaex.progress import _progressbar_registry

print("before", _progressbar_registry.registry)  # {}

df = vaex.from_arrays(id=list(range(100)))
with vaex.progress.tree("vaex", title="test"):
    df.export("file.arrow")

print("after", _progressbar_registry.registry)  # {'rich': <function rich at 0x126014310>, 'simple': <function simple at 0x1260141f0>, 'vaex': <function simple at 0x1260141f0>, 'widget': <function widget at 0x126014280>}

I don't see where or how that is being registered in the code. If I add this to my original function (before the kernel restart)

from vaex.progress import _progressbar_registry, simple, rich, widget

_progressbar_registry["vaex"] = simple
_progressbar_registry["simple"] = simple
_progressbar_registry["rich"] = rich
_progressbar_registry["widget"] = widget

then do the progress, everything works as expected. I don't see where that's happening in code, or why it works after the restart but not before, but thought I'd document it here.

Software information