Enable support for BQCD

maddyscientist commented 12 years ago

This is an issue to track the integration of QUDA with the BQCD library. BQCD is written in Fortran and so requires that QUDA be extended to support being called from Fortran.

maddyscientist commented 12 years ago

The field ordering is described as following

all fields are ordered in the same way: internal indices (fast running) site index even/odd

All fields are Fortran77 kind of arrays. The best way to figure out what is going on is to look at the .f90 files that are generated by 'make'. For example in d/d21_xf.f90 you find

4-component spinor: complex(8), dimension (4, 3, )
2-component spinor: complex(8), dimension (2, 3, ) gauge field : complex(8), dimension (3, 3, *)

and a clover array in clover/clover_mult_a:

complex(8), dimension(18, 2, *) :: a

The internal clover indices are packed. The clover matrices (per site) are transformed into 2 6x6 hermitian matrices. The '2' appears above and the '18' represents the packed 6x6 hermitian matrix. Packing is defined in clover/clover.h. The #defines tell you where to find Aij's in the array clover%a

define A11 Re(a(1,J,i)) A11 is (real and) stored in the real part of a(1,...)

define A22 Im(a(1,J,i)) A22 is (real and) stored in the imaginary part of a(1,...)

define A33 Re(a(11,J,i))

define A44 Im(a(11,J,i))

define A55 Re(a(17,J,i))

define A66 Im(a(17,J,i))

define A12 a(2,J,i) A12 is (complex and) stored in a(2,...)

All fields are of type complex(8), so double precision complex numbers. In terms of QUDA field ordering we will need QUDA_QDP_DIRAC_ORDER for the spinors. It would appear that the mu index and the site index on the gauge fields are merged into a single "*" so this probably corresponds to QUDA_MILC_GAUGE_ORDER since if we are following standard Fortran notation, the column runs faster than the row (column-major ordering).

I have no idea if the clover field ordering matches QUDA_PACKED_CLOVER_ORDER, Ron does this match the packing order your are using in your clover interface?

rbabich commented 12 years ago

In brief, QUDA_PACKED_CLOVER_ORDER is even/odd, then spacetime, then the chiral block index (0 or 1), then the index labeling the 36 unique elements of the packed chiral block. Within a chiral block (corresponding to two of the four spin indices, with color varying fastest), we first store the 6 real numbers along the diagonal, followed by the lower triangular part of the matrix (columnwise, with real/imaginary varying fastest). This is made explicit by the #defines in the dslash_core files:

#define c00_00_re C0.x
#define c01_01_re C0.y
#define c02_02_re C0.z
#define c10_10_re C0.w
#define c11_11_re C1.x
#define c12_12_re C1.y
#define c01_00_re C1.z
#define c01_00_im C1.w
#define c02_00_re C2.x
#define c02_00_im C2.y
#define c10_00_re C2.z
#define c10_00_im C2.w
#define c11_00_re C3.x
#define c11_00_im C3.y
#define c12_00_re C3.z
#define c12_00_im C3.w
#define c02_01_re C4.x
#define c02_01_im C4.y
#define c10_01_re C4.z
#define c10_01_im C4.w
#define c11_01_re C5.x
#define c11_01_im C5.y
#define c12_01_re C5.z
#define c12_01_im C5.w
#define c10_02_re C6.x
#define c10_02_im C6.y
#define c11_02_re C6.z
#define c11_02_im C6.w
#define c12_02_re C7.x
#define c12_02_im C7.y
#define c11_10_re C7.z
#define c11_10_im C7.w
#define c12_10_re C8.x
#define c12_10_im C8.y
#define c12_11_re C8.z
#define c12_11_im C8.w

From the above description, it sounds like the BQCD ordering is similar except that they intersperse the diagonal elements. It also sounds like they're storing the upper-triangular part of the matrix row-wise, instead of the lower triangular part column-wise, so you'll probably have to flip the signs of the imaginary parts. In summary, I think your conversion is going to look like the following (for a single chiral block):

// lookup table for converting from QUDA index to BQCD index:
int bq[36] = { 0,  1, 20, 21, 32, 33,                  // diagonal
               2,  3,  4,  5,  6,  7,  8,  9, 10, 11,  // column 1
              12, 13, 14, 15, 16, 17, 18, 19,          // column 2
              22, 23, 24, 25, 26, 27,                  // column 3
              28, 29, 30, 31,                          // column 4
              34, 35}                                  // column 5

// flip the sign of the imaginary components
int sign[36];
for (int i=0; i<6; i++) sign[i] = 1;
for (int i=6; i<36; i+=2) { sign[i] = 1; sign[i+1] = -1; }

for (int i=0; i<36; i++) clover_quda[i] = sign[i] * clover_bqcd[bq[i]];

maddyscientist commented 12 years ago

As of commit 81101dfdd0d23c7132429b4892cd0ad413f22bd3 we now have initial support for a single GPU Wilson solver from BQCD. This required adding a new gauge ordering in pack_gauge.h, since it turned out BQCD used a different ordering from what was supported in QUDA already.

maddyscientist commented 12 years ago

Clover support has now been added as of commit 237691e275e08666b781b59766fbf4d44c080a71. This is very much only initial support since BQCD uses a different basis for the clover matrices than QUDA does. Currently, the dslash kernels change from UKQCD basis to BQCD basis and back again when applying the clover term. This was achieve through changes in the code generator. This means that the BQCD QUDA build is not compatible with the regular QUDA build. The real solution is to change the BQCD clover matrix elements into the same basis that QUDA uses. Also note that the Hasenbusch mass term is absorbed into the Aee term (this is presently done on the BQCD side of things), e.g.

Aee - kappa^2 Deo Aoo^{-1} Doe + rho -> \hat{A}ee - kappa^2 Deo Aoo^{-1} Doe

maddyscientist commented 12 years ago

Multi-GPU support has been added as of commit afdc383593d6ec96cd5250bb5b488c187a40cd36. This looks to have been a trivial addition of BQCD gauge field order support in the packGauge() routine.

maddyscientist commented 12 years ago

One the last few remaining blockers from merging the BQCD branch into master has been resolved. As of commit bbbdadec4ed79198e5dfb76d5513835293ded0f6 the basis of the clover matrices are changed into QUDA native relativistic basis in the packing routine removing the need for BQCD-specific core kernels. One outstanding multi-GPU issue remains with gauge matrices (at least I think the problem is with the gauge matrices) and then I can merge and close this issue.

maddyscientist commented 12 years ago

The last missing place is now in place. The final bug was that the gauge field ghost zones were not being transposed before being packed. Since packQDPGauge is used for ghost zones regardless of ordering, we need to do the transposition in the packGhost function beforehand so it is in the order packQDPGauge expects. Fixed in commit 00e86c0ac99aba99f9e6d490be9eeed7c2bbae1d.

All that remains of this issue now is to cleanup and merge.

lattice / quda

Enable support for BQCD #73

define A11 Re(a(1,J,i)) A11 is (real and) stored in the real part of a(1,...)

define A22 Im(a(1,J,i)) A22 is (real and) stored in the imaginary part of a(1,...)

define A33 Re(a(11,J,i))

define A44 Im(a(11,J,i))

define A55 Re(a(17,J,i))

define A66 Im(a(17,J,i))

define A12 a(2,J,i) A12 is (complex and) stored in a(2,...)