vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

[BUG-REPORT] Python application won't shut down after calling Vaex df.sum or df.unique #2125

Open abf7d opened 2 years ago

abf7d commented 2 years ago

Description I am building a Python FastApi application that uses Vaex. I noticed that when either function df.sum or df.unique is called I can no longer terminate the application by pressing ctrl + c. Here are examples of how I use them:

df.unique(axis,
          return_inverse=False,
          dropna=True,
          dropnan=True,
          dropmissing=True,
          progress=False,
          selection=None,
          axis=None,
          delay=False,
          array_type="python",
          )

and x = int(df.sum(f"sid__{s}"))

How do I fix this so I am able to manually terminate the application without having to kill the console?

Software information

maartenbreddels commented 2 years ago

Hi Aaron,

This probably isn't a vaex-only issue, I guess it's a windows+threads issue. Does it only happen when it is still running, or after it ran once? And can you reproduce this without using vaex, for instance by starting a thread?

Regards,

Maarten

abf7d commented 2 years ago

Hi Maarten,

I am unable to shut down the appication after either method executes (and finishes) one or more times. This is reproducable and happens every time. I run other methods that work fine. For example calling df.count doesn't cause the problem:

hist = df.count(
        binby=list(axes_val.values()),
        limits=limits,
        shape=num_bins,
        delay=True,
        selection=True,
    )

Also getting the columns doesn't cause an issue:

df.get_column_names()

I experimented with threading. I tried creating a thread and I was still able to kill the app after it completed with ctrl+c. (Was unable to kill it while the thread was still executing):

from threading import Thread
from time import sleep

def threaded_function(arg):
    for i in range(arg):
        print("running")
        sleep(1)

 thread = Thread(target = threaded_function, args = (10, ))
 thread.start()
 thread.join()

Thanks for your reply!

abf7d commented 2 years ago

Hi @maartenbreddels, I just wanted to check in with you one more time. Do you have any recommendations on how to handle this?