mpicbg-scicomp / gearshifft_publication

Publication Manuscript for results obtained with gearshifft
Other
0 stars 1 forks source link

runtime validation #46

Closed tdd11235813 closed 7 years ago

tdd11235813 commented 7 years ago

For comparison study of gearshifft. Have some issues with fftw bench and it does not provide time to solution (probably you just could sum setup + transform time).

Example with fftw_estimate.

# bench --print-precision -onthreads=16 -oestimate -s irf16384
setup: 988.00 us, time: 191.14 us
# now with gearshifft
Time_PlanInitFwd [ms]:          28.7235
Time_FFT [ms]:         0.148688
# now you wonder why the differences
# now again bench with full verbose
# bench --verbose=3 --print-precision -onthreads=16 -oestimate -s irf16384
double
Planning irf16384...
using plan_dft_r2c_1d
estimate-planner time: 0.027745 s
NTHREADS = 16
using plan_dft_r2c_1d
planner time: 0.000999 s
...

fftw code shows that there are two plans measured. First is estimate-planner time and second is planner time. In case of using -oestimate, these two plans have the same flags. It looks like second plan has some information reused (plan becomes destroyed, but no fftw cleanup), so planning time is much faster. However, first planning time correlates with measurements in gearshifft.

The fftw_measure case.

# bench --verbose=3 --print-precision -onthreads=16 -s irf1024
double
Planning irf1024...
using plan_dft_r2c_1d
estimate-planner time: 0.015532 s
NTHREADS = 16
using plan_dft_r2c_1d
planner time: 0.961 s # <---
..
time: 6.13 us

# gearshifft_fftw -e 1024 -r */double/*/Inplace_Real -v
Time_PlanInitFwd [ms]:          440.141 # <---
Time_FFT [ms]:         0.008739

I checked used threads (=16) and tested again with just 1 thread.

setup: 251.34 ms, time: 5.95 us
Time_PlanInitFwd [ms]:          250.276 [avg]
Time_FFT [ms]:         0.008193 [avg]

Probably I can use this style for comparison now, but have to check this later. I also want to test a roundtrip cufft example which I could upload to this repository.

psteinb commented 7 years ago

very good efforts indeed. If all these measurements range in microseconds, what I'd do is select a couple of working points, e.g. "irf1024", "irf16384" and perform some 20-30 gearshifft runs on each. the run 20-30 times the same transform with the bench util. After than plot the 2 distributions and check how far the mean values are apart given their variation. if these plots come out well, they would also offer a visual confirmation that is very hard to disprove.

of course, we then run into questions like, is the code used (in terms of OS support) to measure in bench and gearshifft the same? does one measure CPU time and the other wall clock time? in case we see a deviation between gearshifft and bench, would it compromise our interpretations?

tdd11235813 commented 7 years ago

yeah, but I more want to show the long-term gearshifft use (many extents as input, instead of running gearshifft many times).

cufft roundtrip 25 runs averaged, complex-to-complex. single-precision, 1 warmup.

# strong variation on interactive gpu node
1048576,7.99942,ms,
16777216,109.615,ms,
# gearshifft_cufft -e 1048576 -r */float/*/Inplace_Complex -v
Time_Total [ms]:          8.27397
# /gearshifft_cufft -e 16777216 -r */float/*/Inplace_Complex -v
Time_Total [ms]:          111.489
psteinb commented 7 years ago

I think this has been resolved by #47. I take the liberty to close it.