Closed pentschev closed 1 year ago
cc @kkraus14 who may have some insight on this issue.
@quasiben FYI
This is should be resolved in Arrow 12.0, thanks to @wence- in https://github.com/apache/arrow/issues/34118 .
Seems to be related to long import times https://github.com/rapidsai/cudf/issues/627, in case we decide to tackle that at some point.
Seems to be related to long import times #627, in case we decide to tackle that at some point.
FWIW, importing pyarrow didn't show up as the order-1 effect in my profiles of the import time.
Now that PyArrow doesn't initialize S3 at import time, no AWS threads will be spawned when cuDF is imported. Closing.
Describe the bug While debugging an unrelated hang with Dask, I’ve noticed several pyarrow threads (40 to be precise, one for each CPU core in a DGX-1 I assume) waiting in
aws_event_loop_thread
, even though those particular tests have no direct dependencies onpyarrow
nor AWS.List of all threads in the process
```gdb (gdb) info th ... 4 Thread 0x7f062ffff700 (LWP 16502) "jemalloc_bg_thd" 0x00007f0956a89ad3 in futex_wait_cancelable (private=Backtrace of one of the threads
```gdb (gdb) t a 4 bt Thread 4 (Thread 0x7f062ffff700 (LWP 16502) "jemalloc_bg_thd"): #0 0x00007f0956a89ad3 in futex_wait_cancelable (private=This looks to me very similar to OpenBLAS spawning multiple thread and thus leading to thread oversubscription with Dask which can lead to various unintended consequences (e.g., being very slow), plus it adds a ton of noise when debugging.
Steps/Code to reproduce bug On a conda environment with latest cuDF nightly builds, run the following command:
The above will print the PID of the process (e.g.,
18224
) and sleep for one hour. On a separate shell attach GDB to the process and runinfo threads
to list all threads and confirm the behavior above, e.g.:Note: this is not reproducible with RAPIDS 22.12 stable.
Expected behavior AWS threads are not spawned unless it's truly required by the application.
Environment overview (please complete the following information)
Environment details
Click here to see environment details