Open rainwoodman opened 6 years ago
Dear Yu,
I will have a look at it as soon as possible. Hopefully, I find some time at the weekend.
Best regards Michael
Thanks! I strong suspect it is because I don't quite know what different 'transposed' flags really mean.
While the 2don2d decomposition is probably not useful dealing with 2d data, if the 2d data is from a projection of 3don2d data it can hugely simply downstream applications.
Dear Yu,
I did a rebase of your branch on top of PFFT master (I just skipped the FFTW-include for the moment since I have to test it separately). Have a look at the new branch rebase_2don2d. The last commit fixes the order of input and output arrays in the local transforms. This must be different to 3dto2d remap, since we skip one global remap. You also had a copy paste and planned a serial trafo twice. Hope this fixes your issues. I just did some quick tests with weird unequal blocks sizes. Feel free to test it more deliberately.
We still have to check, whether all the flags are supported in the right way, e.g., DESTROY_INPUT, PRESERVE_INPUT and so on. I also think, that we do not have to use 2 local transposes like in the 3dto2d case. I will think about a simplification. n0/p0 x n1/p1 -> n0/(p0 x p1) x n1 should go directly with only one global remap.
Thanks! That k += error was a shame!
I indeed suspected there must be a simpler way, but I am not sufficiently equipped to work it out..
I'll rebuild and add this to the python binding for some testing about all parameters in the coming days -- I believe almost all flags are tested by the script in the python binding.
On Mon, Jan 29, 2018 at 3:02 PM, Michael Pippig notifications@github.com wrote:
We still have to check, whether all the flags are supported in the right way, e.g., DESTROY_INPUT, PRESERVE_INPUT and so on. I also think, that we do not have to use 2 local transposes like in the 3dto2d case. I will think about a simplification. n0/p0 x n1/p1 -> n0/(p0 x p1) x n1 should go directly with only one global remap.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mpip/pfft/pull/31#issuecomment-361417185, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTBjToWKcgzLvzjmW55eSZtnDO8yIks5tPk3ygaJpZM4RjWWO .
I added a minor fix. I can confirm that currently it writes all zeros if PFFT_DESTORY_INPUT
is not set. My full test matrix is still crashing with FPE error.
Dear Yu, what is the status of this issue. Did you do some more work on this? Do you need some more help?
Sorry for being away so long. The FPE error is gone for whatever reason. Here is the matrix of fails and passes.
It appears whenever the input is not destroyed the output is wrong; we are very close to it.
[yfeng1@waterfall tests]$ mpirun -n 4 python -u roundtrip.py -Nmesh 31 17 -Nproc 2 2 -diag
PASS 28 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
FAIL 20 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 58.3842
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 635.314
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 7896.43
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] c2r: 4.39369
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 139.693
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 74.3346
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 58.3842
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 290.39
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 237.573
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 413.119
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 58.3842
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 7.36372e+33
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 6978.2
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 837.545
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 1.40876e+33
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] c2r: 4.39369
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 1.52584e+21
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 61.8582
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 1627.92
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 1.85845e+38
Seems like we are relying on the side effect on the first argument(in) of 'sertrafo' in local_transp[1]
around line 283 of remap_2dto1d.c, and also around line 326?
(does sertrafo modify the input?)
If that's the case then there is no way we can preserve the input values without modifying sertrafo.
The number of failures changes from run to run. Looks like the only 'safe' combination is PDFFT_DESTROY_INPUT and avoid PFFT_PADDED.
PASS 31 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
FAIL 17 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] c2r: 4.71394
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] c2r: 4.71394
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 6.28835e+35
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 4.11658e+27
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 1.27843e+36
Some pretty big progress are made.
Now the trouble is in the calculation of local_ni of padded r2c / c2r in 2d on 2d mode: it is not padded even when padded r2c is requested.
PASS 48 / 64
NP NMESH TYPE INPLACE FLAGS ERROR
[2, 2] [31, 33] C2C INPL ESTIMATE
[2, 2] [31, 33] C2C OUTP ESTIMATE
[2, 2] [31, 33] C2C INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C INPL ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C OUTP ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] R2C INPL ESTIMATE
[2, 2] [31, 33] R2C OUTP ESTIMATE
[2, 2] [31, 33] R2C INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2C OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL ESTIMATE
[2, 2] [31, 33] C2CF OUTP ESTIMATE
[2, 2] [31, 33] C2CF INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] R2CF INPL ESTIMATE
[2, 2] [31, 33] R2CF OUTP ESTIMATE
[2, 2] [31, 33] R2CF INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2CF OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
FAIL 16 / 64
NP NMESH TYPE INPLACE FLAGS ERROR
[2, 2] [31, 33] R2C INPL ESTIMATE PADDED_C2R PADDED_R2C forward: 3699.34
[2, 2] [31, 33] R2C OUTP ESTIMATE PADDED_C2R PADDED_R2C forward: 156.181
[2, 2] [31, 33] R2C INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 127.235
[2, 2] [31, 33] R2C OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 127.235
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 5620.84
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 171.904
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 184.189
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 279.732
[2, 2] [31, 33] R2CF INPL ESTIMATE PADDED_C2R PADDED_R2C forward: 6649.77
[2, 2] [31, 33] R2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C forward: 8.12206e+24
[2, 2] [31, 33] R2CF INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 6.65086e+23
[2, 2] [31, 33] R2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 127.235
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 124.114
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 2.82665e+22
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 2.39818e+28
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 3.30412e+37
Actually currently 3don3d fails on padded r2c as well.
PASS 48 / 64
NP NMESH TYPE INPLACE FLAGS ERROR forward: 2.51707e+36
[2, 2, 1] [31, 33, 32] C2C INPL ESTIMATE
[2, 2, 1] [31, 33, 32] C2C OUTP ESTIMATE
[2, 2, 1] [31, 33, 32] C2C INPL ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C OUTP ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C INPL ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2C OUTP ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2C INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C INPL DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] C2C OUTP DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] C2C INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2C INPL ESTIMATE
[2, 2, 1] [31, 33, 32] R2C OUTP ESTIMATE
[2, 2, 1] [31, 33, 32] R2C INPL ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2C OUTP ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2C INPL DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] R2C OUTP DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] R2C INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2C OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF INPL ESTIMATE
[2, 2, 1] [31, 33, 32] C2CF OUTP ESTIMATE
[2, 2, 1] [31, 33, 32] C2CF INPL ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF OUTP ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF INPL ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2CF INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF INPL DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] C2CF OUTP DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] C2CF INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2, 1] [31, 33, 32] C2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] C2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2CF INPL ESTIMATE
[2, 2, 1] [31, 33, 32] R2CF OUTP ESTIMATE
[2, 2, 1] [31, 33, 32] R2CF INPL ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2CF OUTP ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2CF INPL DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] R2CF OUTP DESTROY_INPUT ESTIMATE
[2, 2, 1] [31, 33, 32] R2CF INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2, 1] [31, 33, 32] R2CF OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
FAIL 16 / 64
NP NMESH TYPE INPLACE FLAGS ERROR
[2, 2, 1] [31, 33, 32] R2C INPL ESTIMATE PADDED_C2R PADDED_R2C forward: 3.60909e+06
[2, 2, 1] [31, 33, 32] R2C OUTP ESTIMATE PADDED_C2R PADDED_R2C forward: 846.666
[2, 2, 1] [31, 33, 32] R2C INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 40697.2
[2, 2, 1] [31, 33, 32] R2C OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 4671.34
[2, 2, 1] [31, 33, 32] R2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 3.93931e+06
[2, 2, 1] [31, 33, 32] R2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 2581.97
[2, 2, 1] [31, 33, 32] R2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 3508.27
[2, 2, 1] [31, 33, 32] R2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 13096.7
[2, 2, 1] [31, 33, 32] R2CF INPL ESTIMATE PADDED_C2R PADDED_R2C forward: 3.9393e+06
[2, 2, 1] [31, 33, 32] R2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C forward: 846.666
[2, 2, 1] [31, 33, 32] R2CF INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 8.45825e+36
[2, 2, 1] [31, 33, 32] R2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 1.00095e+35
[2, 2, 1] [31, 33, 32] R2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 3.68247e+06
[2, 2, 1] [31, 33, 32] R2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C forward: 4.65759e+36
[2, 2, 1] [31, 33, 32] R2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 7.46692e+36
[2, 2, 1] [31, 33, 32] R2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT forward: 2.51707e+36
OK. I think this is PR pretty much done. The 2don2d support is now as good as the 3don3d support and covers sufficient number of cases to make it useful.
Here is the latest output of the roundtrip script.
@mpip do you want to run more extensive test cases before merging this?
[yfeng1@waterfall test]$ mpirun -n 4 python ../testenv/bin/pfft-roundtrip-matrix.py -Nmesh 31 33 -Nproc 2 2 -diag -rigor estimate
PASS 48 / 64
NP NMESH TYPE INPLACE FLAGS ERROR
[2, 2] [31, 33] C2C INPL ESTIMATE
[2, 2] [31, 33] C2C OUTP ESTIMATE
[2, 2] [31, 33] C2C INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C INPL ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C OUTP ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] R2C INPL ESTIMATE
[2, 2] [31, 33] R2C OUTP ESTIMATE
[2, 2] [31, 33] R2C INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2C OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL ESTIMATE
[2, 2] [31, 33] C2CF OUTP ESTIMATE
[2, 2] [31, 33] C2CF INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C
[2, 2] [31, 33] C2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] C2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT
[2, 2] [31, 33] R2CF INPL ESTIMATE
[2, 2] [31, 33] R2CF OUTP ESTIMATE
[2, 2] [31, 33] R2CF INPL ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2CF OUTP ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE TRANSPOSED_OUT
UNIMPL 16 / 64
NP NMESH TYPE INPLACE FLAGS ERROR
[2, 2] [31, 33] R2C INPL ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C OUTP ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF INPL ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF INPL ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF OUTP ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF INPL DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF OUTP DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
FAIL 0 / 64
NP NMESH TYPE INPLACE FLAGS ERROR
This PR supersedes #30.
Some progress are made but I am currently stuck. @mpip Could you take a look at this?
The idea is to transpose n0 / p0 x n1 / p1 to n0 (p0 * p1) x n1. I followed the 3dto2d example to do three steps:
It sounds easy enough, but currently the implementation is buggy, and I cannot locate the problem.
The main file that implements this is in remap_2dto1d.c. I added a simple/ugly interface in remap.c to dispatch to remap_3dto2d or remap_2dto1d depending on rnk_n. This can be improved later once we get the code working correctly.
I played with the tests simple_test_c2c_2don2d.c:
I initially suspected it was the new array interface; so I did some name clean up to clarify the new array logic. Now I think it is unlikely related.
I checked the 3dto2d appear to be consistent when I change the number of ranks and use a variety of combinations. So it is likely correct (I haven't compared with a single rank transform).
PS: I was working off my branch, where the first few commits bundles fftw; these can be removed later -- it is easier to work with on my workstation where there is not mpi enabled pfft system wide.