ludvigak / FINUFFT.jl

Julia interface to the nonuniform FFT library FINUFFT
Other
33 stars 9 forks source link

Segfault: Calling finufft and fftw in threaded loop #62

Closed jkrimmer closed 1 week ago

jkrimmer commented 3 weeks ago

Unfortunately, I stumbled another segmentation fault caused by multithreading in julia: Calling both finufft and fftw in a threaded loop yields a segmentation fault (Windows and Linux). Here, the order in which finufft and fftw are called does not matter.

MWE

Make sure to initialize julia with multiple threads, e.g., run julia --threads=auto.

using FFTW
using FINUFFT

function _inner(N)
    y = randn(ComplexF64, N)
    x = randn(N)
    nufft1d1(x, y, 1, 1e-8, N, nthreads=1)
    # fft(x)
end

function issue(iters=1000, N=1024; threaded=false)
    if threaded
        Threads.@threads for k in 1:iters
            _inner(N)
        end
    else
        for k in 1:iters
            _inner(N)
        end
    end
end

@info "single-threaded loop" 
issue()
@info "multi-threaded loop"
issue(threaded=true)

Note: Commenting the fft-call or the nufft1d1-call makes the issue disappear.

General information: julia v1.10.4 FINUFFT v3.2.0 finufft_jll v2.2.0+2

Potential but suboptimal solution

The only mitigation approaches I have found so far is going back to libfinufft_jll v2.1.0 (where https://github.com/flatironinstitute/finufft/pull/354 has not been merged) or building a more recent version where fftw_make_planner_thread_safe is re-introduced which is probably not desired. Alternatively, building finufft with DUCC instead of FFTW works as well.

ahbarnett commented 3 weeks ago

I just ran your example on 5700U ryzen 2, 8-core (16 logical threads, which julia chooses), ubuntu 22.04, Julia 1.10.0, FINUFFT v3.2.0, finufft_jll v2.2.0+2, FFTW v1.8.0, FFTW_jll v3.3.10+0, and, yes, only when I uncomment the fft(x) line, it crashes. Thanks for the MWE. The implication that nthreads=1 julia nufft calls are not thread-safe.

Output:

julia> for i=1:100; issue(threaded=true); end
double free or corruption (fasttop)

[48264] signal (6.-6): Aborted
in expression starting at REPL[22]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x77568f689675)
unknown function (ip: 0x77568f6a0cfb)
unknown function (ip: 0x77568f6a29c9)
free at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
forget at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
mkplan at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
fftw_mkapiplan at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
fftw_plan_guru64_dft at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
macro expansion at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:652
[inlined]
cFFTWPlan at /home/alex/.julia/packages/FFTW/6nZei/src/FFTW.jl:49
#plan_fft#10 at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:782
[inlined]
plan_fft at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:772 [inlined]
fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:67
fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:214
[inlined]
fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:66
[inlined]
_inner at ./REPL[20]:5
macro expansion at ./REPL[4]:4 [inlined]
#2#threadsfor_fun#2 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x77568e51c8b2)
_jl_invoke at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894
[inlined]
ijl_apply_generic at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982
[inlined]
start_task at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 13790939 (Pool: 12665074; Big: 1125865); GC: 273
Aborted (core dumped)

Of course, you could try the DUCC option in v2.3.0-rc1, but I don't know if Ludvig has that yet.

The PR #354 is supposed to prevent rather than cause segfaults, of course, so this is rather upsetting - I hope Ludvig has some insight.

Thanks, Alex

On Thu, Aug 22, 2024 at 7:51 AM Jonas Krimmer @.***> wrote:

Unfortunately, I stumbled another segmentation fault caused by multithreading in julia: Calling both finufft and fftw in a threaded loop yields a segmentation fault (Windows and Linux). Here, the order in which finufft and fftw are called does not matter. MWE

Make sure to initialize julia with multiple threads, e.g., run julia --threads=auto.

using FFTW using FINUFFT

function _inner(N) y = randn(ComplexF64, N) x = randn(N) nufft1d1(x, y, 1, 1e-8, N, nthreads=1)

fft(x)

end

function issue(iters=1000, N=1024; threaded=false) if threaded @.*** for k in 1:iters _inner(N) end else for k in 1:iters _inner(N) end end end

@info "single-threaded loop" issue() @info "multi-threaded loop" issue(threaded=true)

julia v1.10.4 FINUFFT v3.2.0 finufft_jll v2.2.0+2 Potential but suboptimal solution

The only mitigation approaches I have found so far is going back to libfinufft_jll v2.1.0 (where flatironinstitute/finufft#354 https://github.com/flatironinstitute/finufft/pull/354 has not been merged) or building a more recent version where fftw_make_planner_thread_safe is re-introduced which is probably not desired.

— Reply to this email directly, view it on GitHub https://github.com/ludvigak/FINUFFT.jl/issues/62, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQEVXWR6FW2A6KL6L3ZSXGBVAVCNFSM6AAAAABM57CGWCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DANJYHE3DSMY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

jkrimmer commented 3 weeks ago

Hi Alex,

Thank you for looking into this issue. Fortunately, I have already been able to test the DUCC option which resolves this issue. Maybe there is some clash between the FFTW-planner calls originating from julia and the ones from within finufft?

Thanks Jonas

ludvigak commented 3 weeks ago

Well, I guess the good news is that thanks to @jkrimmer we now have working crash examples for Windows, Linux, and MacOS!

Not sure if I have any insight though, seems to me like the fundamental issue is the FFTW/FINUFFT interplay with threading active, but then we should be able to reproduce this directly in C++ code.

jkrimmer commented 3 weeks ago

Finally, I have been able to reproduce the issue in C++. It seems like, the finufft-internal fftw-lock does not prevent other calls to fftw from building an FFTW-plan. Please find an MWE (on Ubuntu, this MWE builds with g++ test_finufft_crash.cpp -o test_finufft_crash -I[include] -L[lib] -lfinufft -lfftw3_threads -lfftw3 -fopenmp -Wl,-rpath=[lib] -std=c++11) in the collapsed section below. Commenting out the lock around the finufft-planner-stage prevents the segfault.

MWE ```c++ #include #include #include #include #include #include #include #include using namespace std; #define N 65384 int main() { int64_t Ns[3]; // guru describes mode array by vector [N1,N2..] Ns[0] = N; // set lock recursive_mutex lck; finufft_opts *opts = new finufft_opts; // opts is pointer to struct finufft_default_opts(opts); opts->nthreads = 1; opts->debug = 0; // random nonuniform points (x) and complex strengths (c)... vector> c(N); // init FFTW threads fftw_init_threads(); // FFTW and FINUFFT execution using OpenMP parallelization #pragma omp parallel for for (int j = 0; j < 100; ++j) { // allocate output array for FFTW... vector> F1(N); // FFTW plan lck.lock(); fftw_plan_with_nthreads(1); fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast(c.data()), reinterpret_cast(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE); fftw_destroy_plan(plan); lck.unlock(); // FINUFFT plan // lck.lock(); finufft_plan nufftplan; finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, opts); finufft_destroy(nufftplan); // lck.unlock(); } return 0; } ```

Similarly, we observe the same behavior in julia. Running the following code with multiple threads yields a segmentation fault on my machine

using FFTW
using FINUFFT

tmp = randn(ComplexF64, 1024)

Threads.@threads for k = 1:1000
    plan_fft(tmp)
    finufft_makeplan(1, [1024], 1, 1, 1e-6)
end

However, adding an additional lock to the finufft-planning avoids the segfault:

using FFTW
using FINUFFT

tmp = randn(ComplexF64, 1024)
lk = ReentrantLock()

Threads.@threads for k = 1:1000
    plan_fft(tmp)

    lock(lk)
    try
        finufft_makeplan(1, [1024], 1, 1, 1e-6)
    finally
        unlock(lk)
    end
end
ahbarnett commented 3 weeks ago

Robert @blackwer since you are our expert on fftw global state and thread-safety, would you be able to take a look at the above C++ MWE which causes crash apparently? (I have only verified the julia-version crash) Thanks, Alex

blackwer commented 2 weeks ago

I don't think there is a finufft side solution to this. There's no way for the finufft side to detect what the application side is doing. This is actually in the discussion at https://www.fftw.org/doc/Thread-safety.html, a document I have had to look at way too many times.

One proper solution is for the application to call fftw_make_planner_thread_safe() rather than finufft. finufft can do this, and in practice it's usually sufficient, but there are cases where it could still crash even with the older version that did have this call in it. E.g. if you call finufft_make_plan before you make fftw_plan_dft_1d in your MWE, this solution works. It doesn't when you call fftw_plan_dft_1d directly first. It might work, but it's technically a race condition.

Another solution would be to supply a mechanism to acquire a pointer to the finufft lock on the application side, but I don't think this is adequate for the julia case, since that'd have to end up in the fftw code, which doesn't make much sense.

Is it possible on the julia side to have the FINUFFT import make the fftw_make_planner_thread_safe() call? Otherwise I think the only solution is a custom build either with an alternative FFT implementation or with a patch to the FFTW_INIT routine (though this does have the race condition).

jkrimmer commented 2 weeks ago

Thank you very much for the detailed explanation! I assume that making the call to fftw_make_planner_thread_safe during the FINUFFT initialization in julia should work and provide a quick fix.

Lately, I have been considering another approach: The FFTW-interface FFTW.jl uses julia's ReentrantLock() to protect the FFTW-planning stage from race conditions. If we could pass an application defined lock/locking function to the finufft library, e.g., based on this ReentrantLock, the race conditions would also be avoided in my understanding. Do you think this approach could work as well?

blackwer commented 2 weeks ago

Lately, I have been considering another approach: The FFTW-interface FFTW.jl uses julia's ReentrantLock() to protect the FFTW-planning stage from race conditions. If we could pass an application defined lock/locking function to the finufft library, e.g., based on this ReentrantLock, the race conditions would also be avoided in my understanding. Do you think this approach could work as well?

This definitely works, and can be done straightforwardly without having to extend the API (by having two void (func*)(void *) function pointers on the plan and an optional void * for the lock data). This was very straightforward to implement on the finufft side, so could feasibly make it into 2.3 (maybe?). I've written an example implementation at https://github.com/blackwer/finufft/tree/user_fftw_lock with new driver code

MWE ```c++ #include #include #include #include #include #include using namespace std; #define N 65384 void locker(void *lck) { reinterpret_cast(lck)->lock(); } void unlocker(void *lck) { reinterpret_cast(lck)->unlock(); } int main() { int64_t Ns[3]; // guru describes mode array by vector [N1,N2..] Ns[0] = N; recursive_mutex lck; finufft_opts opts; finufft_default_opts(&opts); opts.nthreads = 1; opts.debug = 0; opts.fftw_lock_fun = locker; opts.fftw_unlock_fun = unlocker; opts.fftw_lock_data = reinterpret_cast(&lck); // random nonuniform points (x) and complex strengths (c)... vector> c(N); // init FFTW threads fftw_init_threads(); // FFTW and FINUFFT execution using OpenMP parallelization #pragma omp parallel for for (int j = 0; j < 100; ++j) { // allocate output array for FFTW... vector> F1(N); // FFTW plan lck.lock(); fftw_plan_with_nthreads(1); fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast(c.data()), reinterpret_cast(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE); fftw_destroy_plan(plan); lck.unlock(); // FINUFFT plan finufft_plan nufftplan; finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts); finufft_destroy(nufftplan); } return 0; } ```

@ahbarnett thoughts?

DiamonDinoia commented 2 weeks ago

This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft?

Alternatively, one has to acquire the lock in Julia and then do the entire finufft_makeplan in a critical section. Which feels wasteful as the only non thread safe code is fftw_plan.

As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :)

blackwer commented 2 weeks ago

This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft?

You should be able to pass from any language that lets you make a standard C-style callback function, which julia definitely lets you do, even with closures. Python does too, if you could make this be a problem there with its GIL limited threading. https://julialang.org/blog/2013/05/callback/ has details on how do do this in julia.

As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :)

I'm in no rush, and if Jonas is happy with the stop-gap fftw_make_planner_thread_safe option, then I am. But the provided solution does already work (I think I just fixed the last issue) and is about as language agnostic as possible with a very small code footprint.

ahbarnett commented 2 weeks ago

Hi Robert,

I think this is a great idea, and should go in 2.3.1 or 2.4, since it changes the opts struct (which requires changing ctypes for py, fortran module, docs, etc). Thanks for the demonstration! It will only be used by power users, but there is no harm in having it, and gives flexibility in the future. I opened it as a PR to FINUFFT master. The jl wrapper could then provide the power jl used with ability to pass in a lock from FFTW.jl

It's amazing what a mess having multiple packages call the same library with non-thread-safe functions, is.

While DUCC gets rid of this problem, there will always be users who want FFTW due to faster in certain cases.

On Wed, Sep 4, 2024 at 3:47 PM Robert Blackwell @.***> wrote:

This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft?

You should be able to pass from any language that lets you make a standard C-style callback function, which julia definitely lets you do, even with closures. Python does too, if you could make this be a problem there with its GIL limited threading. https://julialang.org/blog/2013/05/callback/ has details on how do do this in julia.

As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :)

I'm in no rush, and if Jonas is happy with the stop-gap fftw_make_planner_thread_safe option, then I am. But the provided solution does already work (I think I just fixed the last issue) and is about as language agnostic as possible with a very small code footprint.

— Reply to this email directly, view it on GitHub https://github.com/ludvigak/FINUFFT.jl/issues/62#issuecomment-2329845704, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNZRSQTRYS73P3KCZIQJ4DZU5PUDAVCNFSM6AAAAABM57CGWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZHA2DKNZQGQ . You are receiving this because you were mentioned.Message ID: @.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

jkrimmer commented 1 week ago

Dear all,

Thanks a lot for your work on this issue! I have successfully tested the user-locking approach by @blackwer in julia over at https://github.com/jkrimmer/FINUFFT.jl/tree/user_fftw_lock. However, until the required library adaptations are merged, we can use fftw_make_planner_thread_safe (see #66 ) to resolve this issue.

Cheers