Closed tdd11235813 closed 7 years ago
very good efforts indeed. If all these measurements range in microseconds, what I'd do is select a couple of working points, e.g. "irf1024", "irf16384" and perform some 20-30 gearshifft runs on each. the run 20-30 times the same transform with the bench util. After than plot the 2 distributions and check how far the mean values are apart given their variation. if these plots come out well, they would also offer a visual confirmation that is very hard to disprove.
of course, we then run into questions like, is the code used (in terms of OS support) to measure in bench and gearshifft the same? does one measure CPU time and the other wall clock time? in case we see a deviation between gearshifft and bench, would it compromise our interpretations?
yeah, but I more want to show the long-term gearshifft use (many extents as input, instead of running gearshifft many times).
cufft roundtrip 25 runs averaged, complex-to-complex. single-precision, 1 warmup.
# strong variation on interactive gpu node
1048576,7.99942,ms,
16777216,109.615,ms,
# gearshifft_cufft -e 1048576 -r */float/*/Inplace_Complex -v
Time_Total [ms]: 8.27397
# /gearshifft_cufft -e 16777216 -r */float/*/Inplace_Complex -v
Time_Total [ms]: 111.489
I think this has been resolved by #47. I take the liberty to close it.
For comparison study of gearshifft. Have some issues with fftw bench and it does not provide time to solution (probably you just could sum setup + transform time).
Example with fftw_estimate.
fftw code shows that there are two plans measured. First is estimate-planner time and second is planner time. In case of using -oestimate, these two plans have the same flags. It looks like second plan has some information reused (plan becomes destroyed, but no fftw cleanup), so planning time is much faster. However, first planning time correlates with measurements in gearshifft.
The fftw_measure case.
I checked used threads (=16) and tested again with just 1 thread.
Probably I can use this style for comparison now, but have to check this later. I also want to test a roundtrip cufft example which I could upload to this repository.