mpip / pfft

Parallel fast Fourier transforms
GNU General Public License v3.0
54 stars 23 forks source link

WIP: Another try at 2don2d #31

Open rainwoodman opened 6 years ago

rainwoodman commented 6 years ago

This PR supersedes #30.

Some progress are made but I am currently stuck. @mpip Could you take a look at this?

The idea is to transpose n0 / p0 x n1 / p1 to n0 (p0 * p1) x n1. I followed the 3dto2d example to do three steps:

It sounds easy enough, but currently the implementation is buggy, and I cannot locate the problem.

The main file that implements this is in remap_2dto1d.c. I added a simple/ugly interface in remap.c to dispatch to remap_3dto2d or remap_2dto1d depending on rnk_n. This can be improved later once we get the code working correctly.

I played with the tests simple_test_c2c_2don2d.c:

I initially suspected it was the new array interface; so I did some name clean up to clarify the new array logic. Now I think it is unlikely related.

I checked the 3dto2d appear to be consistent when I change the number of ranks and use a variety of combinations. So it is likely correct (I haven't compared with a single rank transform).

PS: I was working off my branch, where the first few commits bundles fftw; these can be removed later -- it is easier to work with on my workstation where there is not mpi enabled pfft system wide.

mpip commented 6 years ago

Dear Yu,

I will have a look at it as soon as possible. Hopefully, I find some time at the weekend.

Best regards Michael

rainwoodman commented 6 years ago

Thanks! I strong suspect it is because I don't quite know what different 'transposed' flags really mean.

While the 2don2d decomposition is probably not useful dealing with 2d data, if the 2d data is from a projection of 3don2d data it can hugely simply downstream applications.

mpip commented 6 years ago

Dear Yu,

I did a rebase of your branch on top of PFFT master (I just skipped the FFTW-include for the moment since I have to test it separately). Have a look at the new branch rebase_2don2d. The last commit fixes the order of input and output arrays in the local transforms. This must be different to 3dto2d remap, since we skip one global remap. You also had a copy paste and planned a serial trafo twice. Hope this fixes your issues. I just did some quick tests with weird unequal blocks sizes. Feel free to test it more deliberately.

mpip commented 6 years ago

We still have to check, whether all the flags are supported in the right way, e.g., DESTROY_INPUT, PRESERVE_INPUT and so on. I also think, that we do not have to use 2 local transposes like in the 3dto2d case. I will think about a simplification. n0/p0 x n1/p1 -> n0/(p0 x p1) x n1 should go directly with only one global remap.

rainwoodman commented 6 years ago

Thanks! That k += error was a shame!

I indeed suspected there must be a simpler way, but I am not sufficiently equipped to work it out..

I'll rebuild and add this to the python binding for some testing about all parameters in the coming days -- I believe almost all flags are tested by the script in the python binding.

On Mon, Jan 29, 2018 at 3:02 PM, Michael Pippig notifications@github.com wrote:

We still have to check, whether all the flags are supported in the right way, e.g., DESTROY_INPUT, PRESERVE_INPUT and so on. I also think, that we do not have to use 2 local transposes like in the 3dto2d case. I will think about a simplification. n0/p0 x n1/p1 -> n0/(p0 x p1) x n1 should go directly with only one global remap.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mpip/pfft/pull/31#issuecomment-361417185, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIbTBjToWKcgzLvzjmW55eSZtnDO8yIks5tPk3ygaJpZM4RjWWO .

rainwoodman commented 6 years ago

I added a minor fix. I can confirm that currently it writes all zeros if PFFT_DESTORY_INPUT is not set. My full test matrix is still crashing with FPE error.

mpip commented 6 years ago

Dear Yu, what is the status of this issue. Did you do some more work on this? Do you need some more help?

rainwoodman commented 6 years ago

Sorry for being away so long. The FPE error is gone for whatever reason. Here is the matrix of fails and passes.

It appears whenever the input is not destroyed the output is wrong; we are very close to it.

[yfeng1@waterfall tests]$ mpirun -n 4 python -u roundtrip.py -Nmesh 31 17 -Nproc 2 2 -diag
PASS 28 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [31, 17]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [31, 17]
FAIL 20 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 58.3842
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 635.314
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 7896.43
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] c2r: 4.39369
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 139.693
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 74.3346
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 58.3842
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 290.39
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 237.573
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 413.119
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 58.3842
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 7.36372e+33
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 6978.2
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 837.545
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 1.40876e+33
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] c2r: 4.39369
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 1.52584e+21
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [31, 17] r2c: 61.8582
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [31, 17] r2c: 1627.92
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [31, 17] r2c: 1.85845e+38
rainwoodman commented 6 years ago

Seems like we are relying on the side effect on the first argument(in) of 'sertrafo' in local_transp[1] around line 283 of remap_2dto1d.c, and also around line 326?

(does sertrafo modify the input?)

If that's the case then there is no way we can preserve the input values without modifying sertrafo.

rainwoodman commented 6 years ago

The number of failures changes from run to run. Looks like the only 'safe' combination is PDFFT_DESTROY_INPUT and avoid PFFT_PADDED.

PASS 31 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace True Nmesh [8, 8]
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_DESTROY_INPUT InPlace False Nmesh [8, 8]
FAIL 17 / 48
NP [2, 2] PFFT_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFT_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] c2r: 4.71394
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFT_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] c2r: 4.71394
NP [2, 2] PFFT_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_C2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_C2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 6.28835e+35
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_DESTROY_INPUT|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 19.5054
NP [2, 2] PFFTF_R2C PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 15.3301
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace True Nmesh [8, 8] r2c: 4.11658e+27
NP [2, 2] PFFTF_R2C PFFT_TRANSPOSED_OUT|PFFT_ESTIMATE|PFFT_PADDED_R2C|PFFT_PADDED_C2R InPlace False Nmesh [8, 8] r2c: 1.27843e+36
rainwoodman commented 6 years ago

Some pretty big progress are made.

Now the trouble is in the calculation of local_ni of padded r2c / c2r in 2d on 2d mode: it is not padded even when padded r2c is requested.

PASS 48 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR
[2, 2] [31, 33] C2C    INPL                 ESTIMATE                                                           
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] C2C    INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    INPL                 ESTIMATE                                                           
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] R2C    INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE                                                           
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE                                                           
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
FAIL 16 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR
[2, 2] [31, 33] R2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 3699.34
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 156.181
[2, 2] [31, 33] R2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 127.235
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 127.235
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 5620.84
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 171.904
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 184.189
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 279.732
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 6649.77
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 8.12206e+24
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 6.65086e+23
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 127.235
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 124.114
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 2.82665e+22
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 2.39818e+28
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 3.30412e+37
rainwoodman commented 6 years ago

Actually currently 3don3d fails on padded r2c as well.

PASS 48 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR forward: 2.51707e+36

[2, 2, 1] [31, 33, 32] C2C    INPL                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2C    OUTP                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2C    INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2C    OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2C    INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2C    INPL                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2C    OUTP                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2C    INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2C    OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2C    INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2C    OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2C    INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2C    OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   INPL                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2CF   OUTP                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2CF   INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2CF   OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] C2CF   INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2, 1] [31, 33, 32] C2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] C2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2CF   INPL                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2CF   OUTP                 ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2CF   INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2CF   OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2CF   INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2CF   OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2, 1] [31, 33, 32] R2CF   INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2, 1] [31, 33, 32] R2CF   OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
FAIL 16 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR
[2, 2, 1] [31, 33, 32] R2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 3.60909e+06
[2, 2, 1] [31, 33, 32] R2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 846.666
[2, 2, 1] [31, 33, 32] R2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 40697.2
[2, 2, 1] [31, 33, 32] R2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 4671.34
[2, 2, 1] [31, 33, 32] R2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 3.93931e+06
[2, 2, 1] [31, 33, 32] R2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 2581.97
[2, 2, 1] [31, 33, 32] R2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 3508.27
[2, 2, 1] [31, 33, 32] R2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 13096.7
[2, 2, 1] [31, 33, 32] R2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 3.9393e+06
[2, 2, 1] [31, 33, 32] R2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 846.666
[2, 2, 1] [31, 33, 32] R2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 8.45825e+36
[2, 2, 1] [31, 33, 32] R2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 1.00095e+35
[2, 2, 1] [31, 33, 32] R2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 3.68247e+06
[2, 2, 1] [31, 33, 32] R2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     forward: 4.65759e+36
[2, 2, 1] [31, 33, 32] R2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 7.46692e+36
[2, 2, 1] [31, 33, 32] R2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      forward: 2.51707e+36
rainwoodman commented 6 years ago

OK. I think this is PR pretty much done. The 2don2d support is now as good as the 3don3d support and covers sufficient number of cases to make it useful.

Here is the latest output of the roundtrip script.

@mpip do you want to run more extensive test cases before merging this?

[yfeng1@waterfall test]$ mpirun -n 4 python ../testenv/bin/pfft-roundtrip-matrix.py -Nmesh 31 33 -Nproc 2 2 -diag -rigor estimate
PASS 48 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR
[2, 2] [31, 33] C2C    INPL                 ESTIMATE                                                           
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] C2C    INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    INPL                 ESTIMATE                                                           
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] R2C    INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE                                                           
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     
[2, 2] [31, 33] C2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] C2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE                                                           
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE                                                           
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE                                                           
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE                       TRANSPOSED_OUT                      
UNIMPL 16 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR
[2, 2] [31, 33] R2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2C    OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   INPL                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   OUTP                 ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C                                     Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   INPL   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
[2, 2] [31, 33] R2CF   OUTP   DESTROY_INPUT ESTIMATE PADDED_C2R PADDED_R2C TRANSPOSED_OUT                      Currently using the same ProcMesh (2) dimentions with Mesh (2) is not supported on padded transforms.
FAIL 0 / 64
NP     NMESH    TYPE   INPLACE FLAGS                                                                            ERROR