mpip / pfft

Parallel fast Fourier transforms
GNU General Public License v3.0
54 stars 23 forks source link

Bench r2c #17

Open rainwoodman opened 9 years ago

rainwoodman commented 9 years ago

This is a brief version of r2c /c2r benchmark program.

There are issues. But I can already observe with a 384x384x384 mesh pfft is slower than fftw running on a single rank by about 10%.

I am not quite sure how to fix the large FFTW errors. Also needs a commandline flag to add PADDED support.

[yfeng1@waterfall tests]$ ./bench_r2c -pfft_n 384 384 384 -pfft_cmp_fftw -pfft_inplace -pfft_patience 0 -pfft_destroy_input
******************************************************************************************************
* Computation of loops=1 parallel forward and backward FFTs (change with -pfft_loops *)
* for n[0] x n[1] x n[2] = 384 x 384 x 384 Fourier coefficients (change with -pfft_n * * *)
* on  np[0] x np[1] x np[2] = 1 x 1 x 1 processes (change with -pfft_np * * *)
* with:
*      - non-transposed data layout (change with -pfft_transposed)
*      - non-verbose output (change with -pfft_verbose)
*      - in-place transforms (change with -pfft_inplace)
*      - disabled decomposition comparison (change with -pfft_cmp_decomp)
*      - enabled FFTW comparison (change with -pfft_cmp_fftw)
*      - disabled comparison of all planner flags (change with -pfft_cmp_flags)
*      - disabled output of internal PFFT timer (change with -pfft_timer)
*      - pfft_flags = PFFT_ESTIMATE | PFFT_NO_TUNE | PFFT_DESTROY_INPUT
*        (change with [-pfft_patience  0|1|2|3] [-pfft_tune] [-pfft_destroy_input])
*******************************************************************************************************

!!! Warning: inplace transforms do not support DESTROY_INPUT flag !!!
* PFFT runtimes (1d data decomposition):
Flags: PFFT_NO_TUNE, PFFT_ESTIMATE, PFFT_DESTROY_INPUT, 
tune_forw = 2.58e-03; tune_back = 2.56e-03, exec_forw/loops = 1.34e+00, exec_back/loops = 1.35e+00
error = 6.44e-14

* FFTW_MPI runtimes (1d data decomposition):
Flags: FFTW_ESTIMATE, FFTW_PRESERVE_INPUT
tune_forw = 2.89e-03; tune_back = 1.21e-04, exec_forw/loops = 1.21e+00, exec_back/loops = 1.21e+00
error = 9.48e+02
Flags: FFTW_MEASURE, FFTW_PRESERVE_INPUT
tune_forw = 1.34e+01; tune_back = 1.13e-04, exec_forw/loops = 9.63e-01, exec_back/loops = 9.61e-01
error = 9.48e+02
* serial FFTW runtimes (no data decomposition at all):
Flags: FFTW_ESTIMATE, FFTW_PRESERVE_INPUT
tune_forw = 1.26e-04; tune_back = 7.99e-05, exec_forw/loops = 9.62e-01, exec_back/loops = 9.62e-01
error = 9.48e+02
Flags: FFTW_MEASURE, FFTW_PRESERVE_INPUT
tune_forw = 1.29e-04; tune_back = 8.11e-05, exec_forw/loops = 9.61e-01, exec_back/loops = 9.64e-01
error = 9.48e+02