lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
292 stars 99 forks source link

Multi-GPU codes and gauge radius #389

Closed nmrcardoso closed 9 years ago

nmrcardoso commented 9 years ago

We need to make some fixes about the multi-gpu code and the gauge radius, because in some codes it uses a fixed radius and others only reads the radius if the dimension is partitioned.

found in codes: 1) for(int dir=0; dir<4; ++dir){ if(comm_dim_partitioned(dir)) border[dir] = data.R()[dir]; else border[dir] = 0; X[dir] = data.X()[dir] - border[dir]_2; } and 2) for(int dir=0; dir<4; ++dir){ border[dir] = 2; X[dir] = data.X()[dir] - border[dir]_2; }

we should change all for: for(int dir=0; dir<4; ++dir){ border[dir] = data.R()[dir]; X[dir] = data.X()[dir] - border[dir]*2; } I used a lot the 1). At least I found the gauge_plaq.cu using the 2) and in the case of the codes that only adds borders to the partitioned dimensions this gives incorrect results.

Tomorrow if I have time I will be changing this.

maddyscientist commented 9 years ago

Agree this needs to be fixed, and 2.) will lead to bad performance since it adds unnecessary overhead. Why would you not want to keep the check for comm_dim_partitioned though, like in 1.)?

nmrcardoso commented 9 years ago

Because there are codes that puts the same radius in all dimensions even if some of them aren't partitioned, that's why it's better to remove the "if(comm_dim_partitioned(dir))" and since R() in cudaGaugeField already includes this information. Examples in the interface_quda.cpp: 1) int R[4] = {0,0,0,0}; for(int dir=0; dir<4; ++dir) if(comm_dim_partitioned(dir)) R[dir] = 2; for(int dir=0; dir<4; ++dir) y[dir] = cudaInGauge->X()[dir] + 2 * R[dir]; 2) int R[4] = {2,2,2,2}; for (int dir=0; dir<4; ++dir) y[dir] = gaugePrecise->X()[dir] + 4;

Although, its better to change the codes to the version 1) in order to avoid extra packing and transfers. Also the inner codes that uses exchangeExtendedGhost(const int *R, bool no_comms_fill=false); need to be used carefully to support both versions because of no_comms_fill needs to be true in 2) (in order to exchange data in non partitioned dimensions) and false in 1), if I'm not mistaken.

nmrcardoso commented 9 years ago

or adding R() check in cuda_gauge_field.cu, void cudaGaugeField::exchangeExtendedGhost(const int *R, bool no_comms_fill): if (!commDimPartitioned(d) && !no_comms_fill) continue; to jump if no_comms_fill=true and R()[d]=0

Also we can remove the "const int *R" and using inside the cudaGaugeField::R() instead, also in principle the bool no_comms_fill can go by using the R() checking instead.

nmrcardoso commented 9 years ago

Done for(int dir=0; dir<4; ++dir){ border[dir] = data.R()[dir]; X[dir] = data.X()[dir] - border[dir]2; } in 05e44fd1df73a119caa98d1b641c2eadd86eed59. I didn't make any changes in void cudaGaugeField::exchangeExtendedGhost(const int R, bool no_comms_fill):

mathiaswagner commented 9 years ago

Interesting that this comes up now as I noticed issues with the exchangeExtendedGhost (initcheck complains there) for the fat link code and there

    int R[4] = {2, 2, 2, 2}; 
    cudaInLinkEx->exchangeExtendedGhost(R,true);

Is used.

Not sure whether is causes the issues with the FL but maybe it's worth having a look.

nmrcardoso commented 9 years ago

I made some fixes already in 05e44fd1df73a119caa98d1b641c2eadd86eed59. @mathiaswagner , in the interface_quda.cpp there are lot of places where the cudaGaugeField::R() is not defined, don't know if all the inner codes uses the cudaGaugeField::R(). Also, I don't know if all the codes that adds a border radius to the non partitioned dimensions need to be done in this way or if we can change the this to only add border radius for the partitioned dimensions, doing this change we will gain in performance. @mathiaswagner , in principle as long as all dimensions get the same border radius, the no_comms_fill in exchangeExtendedGhost() should be true in order to update the links in the non partitioned dimensions. The problem here is if the inner codes only adds a border for the partitioned dimensions or if there are data updates only in the partitioned dimensions...

mathiaswagner commented 9 years ago

fixed with #390.