tylerjthomas9 / RAPIDS.jl

An unofficial Julia wrapper for the RAPIDS.ai ecosystem using PythonCall.jl
MIT License
17 stars 1 forks source link

Segmentation fault when reading CSV files using multiple threads #37

Open ymtoo opened 1 year ago

ymtoo commented 1 year ago

MWE (test_rapids.jl):

using CSV
using DataFrames
using RAPIDS

csvpath = "./metadata.csv"

@info "Generate dummy data"
X = randn(100000, 5)
df = DataFrame(X, :auto)
CSV.write(csvpath, df)

@info "Read the data"
df = CSV.read(csvpath, DataFrame, ntasks=2)

@info "Done"

Running the script:

$ julia --project -t 2 test_rapids.jl 
[ Info: Generate dummy data
[ Info: Read the data
[arl-X570-AORUS-MASTER:16308:0:16311] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x7f9f98bdf008)
==== backtrace (tid:  16311) ====
 0  /home/arl/Projects/test/test-rapids/.CondaPkg/env/lib/python3.10/site-packages/raft_dask/common/../../../.././libucs.so.0(ucs_handle_error+0x2fd) [0x7f9dc901cc2d]
 1  /home/arl/Projects/test/test-rapids/.CondaPkg/env/lib/python3.10/site-packages/raft_dask/common/../../../.././libucs.so.0(+0x29e34) [0x7f9dc901ce34]
 2  /home/arl/Projects/test/test-rapids/.CondaPkg/env/lib/python3.10/site-packages/raft_dask/common/../../../.././libucs.so.0(+0x29ffa) [0x7f9dc901cffa]
 3  /home/arl/julia-1.9.1/bin/../lib/julia/libjulia-internal.so.1(_jl_mutex_wait+0x91) [0x7f9f97e2c151]
 4  /home/arl/julia-1.9.1/bin/../lib/julia/libjulia-internal.so.1(_jl_mutex_lock+0x30) [0x7f9f97e2c210]
 5  /home/arl/julia-1.9.1/bin/../lib/julia/libjulia-codegen.so.1(jl_generate_fptr_impl+0x83) [0x7f9f9786b6a3]
 6  /home/arl/julia-1.9.1/bin/../lib/julia/libjulia-internal.so.1(jl_compile_method_internal+0xa0) [0x7f9f97ddf2e0]
 7  /home/arl/julia-1.9.1/bin/../lib/julia/libjulia-internal.so.1(ijl_apply_generic+0x43e) [0x7f9f97de00ee]
 8  /home/arl/julia-1.9.1/bin/../lib/julia/libjulia-internal.so.1(jl_f__call_latest+0x39) [0x7f9f97deef19]
 9  /home/arl/.julia/compiled/v1.9/CSV/HHBkp_QmxWY.so(+0x9c488) [0x7f9f1358d488]
=================================

[16308] signal (11.-6): Segmentation fault
in expression starting at /home/arl/Projects/test/test-rapids/test_rapids.jl:13
_jl_mutex_wait at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/threading.c:707
_jl_mutex_lock at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/threading.c:745
jl_mutex_lock at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia_locks.h:66 [inlined]
jl_generate_fptr_impl at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/jitlayers.cpp:424
jl_compile_method_internal at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2348 [inlined]
jl_compile_method_internal at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2237
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2750 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
jl_f__call_latest at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:816 [inlined]
invokelatest at ./essentials.jl:813 [inlined]
defaultsentinel at /home/arl/.julia/packages/SentinelArrays/cav7N/src/SentinelArrays.jl:72 [inlined]
SentinelArray at /home/arl/.julia/packages/SentinelArrays/cav7N/src/SentinelArrays.jl:80
SentinelArray at /home/arl/.julia/packages/SentinelArrays/cav7N/src/SentinelArrays.jl:98 [inlined]
SentinelArray at /home/arl/.julia/packages/SentinelArrays/cav7N/src/SentinelArrays.jl:98
jfptr_SentinelArray_2394 at /home/arl/.julia/compiled/v1.9/CSV/HHBkp_QmxWY.so (unknown line)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
allocate at /home/arl/.julia/packages/CSV/OnldF/src/utils.jl:141
allocate! at /home/arl/.julia/packages/CSV/OnldF/src/utils.jl:116
multithreadparse at /home/arl/.julia/packages/CSV/OnldF/src/file.jl:354
macro expansion at /home/arl/.julia/packages/WorkerUtilities/ey0fP/src/WorkerUtilities.jl:384 [inlined]
#34 at ./threadingconstructs.jl:373
unknown function (ip: 0x7f9f17db0d7f)
_jl_invoke at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/julia.h:1879 [inlined]
start_task at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-9/src/task.c:1092
Allocations: 22186290 (Pool: 22173035; Big: 13255); GC: 31
Segmentation fault (core dumped)

It works fine if it's on a single thread:

df = CSV.read(csvpath, DataFrame, ntasks=1)

Julia and the package version:

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 24 × AMD Ryzen 9 5900X 12-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver3)
  Threads: 1 on 24 virtual cores

(test-rapids) pkg> st
Status `~/Projects/test/test-rapids/Project.toml`
  [336ed68f] CSV v0.10.11
  [a93c6f00] DataFrames v1.5.0
  [2764e59e] RAPIDS v0.3.3
tylerjthomas9 commented 1 year ago

This issue happens when we have cugraph imported via PythonCall. I am looking into the issue further. This will result in the same error:

using CSV
using DataFrames
using PythonCall
const cugraph = PythonCall.pynew()
PythonCall.pycopy!(cugraph, pyimport("cugraph"))

csvpath = "./metadata.csv"

@info "Generate dummy data"
X = randn(100000, 5)
df = DataFrame(X, :auto)
CSV.write(csvpath, df)

@info "Read the data"
df = CSV.read(csvpath, DataFrame, ntasks=2)

@info "Done"
tylerjthomas9 commented 1 year ago

I have removed the automatic cugraph pyimport in the main branch. Let me know if this fixes your issue. I will continue to look into what is happening to cause this. If this works for you too, I will probably publish this release to avoid potential issues with cugraph in my processes.

ymtoo commented 1 year ago

This fixes the issue. The main branch works fine with multi-threaded CSV reading. Thanks!

tylerjthomas9 commented 1 year ago

I pushed the update to the registry. Over the next few weeks, I will look at directly interfacing with the C++/CUDA functions instead of wrapping Python, so hopefully, weird bugs like this will go away when the extra layer is removed.