vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

[BUG-REPORT] mysterious `NameError: name '__filter__' is not defined` #2115

Closed Ben-Epstein closed 2 years ago

Ben-Epstein commented 2 years ago

Thank you for reaching out and helping us improve Vaex!

Before you submit a new Issue, please read through the documentation. Also, make sure you search through the Open and Closed Issues - your problem may already be discussed or addressed.

Description Please provide a clear and concise description of the problem. This should contain all the steps needed to reproduce the problem. A minimal code example that exposes the problem is very appreciated.

Software information

Additional information I unfortunately cannot reliably reproduce this error.

It happens in a fastAPI server when the server is being hit with many requests. Here is the stacktrace (comes from a filter on the data followed by a df.extract()

  df_copy = df_copy.extract()
  File "/usr/local/lib/python3.9/site-packages/vaex/dataframe.py", line 4468, in extract
    df._push_down_filter()
  File "/usr/local/lib/python3.9/site-packages/vaex/dataframe.py", line 4474, in _push_down_filter
    self._fill_filter_mask()  # make sure the mask is filled
  File "/usr/local/lib/python3.9/site-packages/vaex/dataframe.py", line 5688, in _fill_filter_mask
    self.execute()
  File "/usr/local/lib/python3.9/site-packages/vaex/dataframe.py", line 417, in execute
    self.executor.execute()
  File "/usr/local/lib/python3.9/site-packages/vaex/execution.py", line 308, in execute
    for _ in self.execute_generator():
NameError: name '__filter__' is not defined

@maartenbreddels mentioned that

I did a benchmark with ab (apache bench) a while ago, triggering a LOT of requests per second, and that triggered a bug. maybe I can trigger this bug as well using the same method