fftwf_plan_dft is not threadsafe

mohabouje / eDSP

A cross-platform DSP library written in C++ 11/14. This library harnesses the power of C++ templates to implement a complete set of DSP algorithms.

https://mohabouje.github.io/edsp-docs/

GNU General Public License v3.0

188 stars 33 forks source link

fftwf_plan_dft is not threadsafe #18

Open kC0pter opened 4 years ago

kC0pter commented 4 years ago

Hey it's me again.

I found out during testing the xcorr that the fftw implementation is not threadsafe. More Specific it is the calls to the fftwf_plan_dft functions. The only functions that are threadsafe itself are the fftwf_execute calls. Found the info here and here.

I fixed it by passing a mutex to the xcorr call and passing it all the way down to the fftw impl as a member reference and wrapping a lock around the fftwf_plan_dft calls. It's a bit hacky but does the job for my project.

Example:

inline void dft(const value_type* src, complex_type* dst) {
            std::unique_lock<std::mutex> lock(mutex_);
            if (meta::is_null(plan_)) {
                plan_ = fftwf_plan_dft_r2c_1d(nfft_, internal::fftw_cast(src), internal::fftw_cast(dst),
                                              FFTW_ESTIMATE | FFTW_PRESERVE_INPUT);
            }
            lock.unlock();
            fftwf_execute_dft_r2c(plan_, internal::fftw_cast(src), internal::fftw_cast(dst));
        }

mohabouje commented 4 years ago

Sorry to hear that, but in the actual design eDSP is a single thread library, it does not have any support for multithreaded application mostly because it aims to be a base implementation for DSP/low latency applications and I did not want to introduce the penalty of having std::mutex, spin_lock or semaphores everywhere.

You can always build a wrapper on top of it or keep the lock in your implementation. In this scenario, I do normally have an FFT instance per thread.

kC0pter commented 4 years ago

Oh well okay didn't knew the single threaded approach. Then it actually is not really a bug.

On the other hand i use it exactly like you describe it. On two threads i call the xcorr function independently on independent vectors/datasets. Since the xcorr creates a FFT instance itself i have a independet FFT instance per thread and it still does crash.

The FFTW doc i mentioned on thread-safety states "The FFTW planner is intended to be called from a single thread. If you really must call it from multiple threads, you are expected to grab whatever lock makes sense for your application..." and since exactly that is happening it kinda is a bug. The same doc also mentions that "FFTW is designed for the situation where the only performance-sensitive code is the actual execution of the transform" so wrapping a mutex only around the planning is not really a performance impact. Maybe also including a multithreaded implementation like xcorr_mt would be a nice solution.

mohabouje commented 4 years ago

Good point! I completely forget about that.

I need to update the official documentation to make that point clear.

I see what you mean. So in my case, I'm using a factory provider. Just a simple: make_fft helper function with a locker. All threads are using this.

I wanted to do something similar to what the standard does with the std::make_tuple or std::make_shared, in my current implementation I have a simple std::mutex in a .cpp file, not the most elegant approach tho.

I'm gonna try to come with a better approach and provide a helper in the utils section in future releases.