Closed carlocamilloni closed 3 years ago
If the problem is the warning "WARNING: LatticeReduction ..." there are two possible solutions.
In file tools/LatticeReduction.cpp
, increase const double epsilon=1e-14
by a couple of orders of magnitude (1e-12)
In the same file, search for "WARNING" (note that it appears twice, you should probably correct both) and add a break, something like: if(counter>=10000){ fprintf(stderr,"WARNING: LatticeReduction::reduce stuck after %u iterations\n",counter);
The first solution perhaps is a bit better because it will still stop the simulation if the compiler is screwing something up in the optimization. But the second is also fine, if the users then see the warning. Or perhaps you can implement both, so that the check is less strict and when there is a problem the user can at least continue the simulation (being warned).
I think this was the problem that you (@carlocamilloni) told me a few weeks ago. I am not sure that the problem reported on the mailing list is the same one. If you have chance to try on the machine leading to the problem, let me know if this solves the issue. Thanks!
Giovanni
On Fri, Sep 27, 2019 at 9:44 AM Carlo Camilloni notifications@github.com wrote:
I had problems with PGI and it was reported an issue with gcc 6 @GiovanniBussi https://github.com/GiovanniBussi how can we improve it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/plumed/plumed2/issues/524?email_source=notifications&email_token=ABBNCXVKBPCGTMNGGSXXXTDQLW2ULA5CNFSM4I3C65U2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HOCSENQ, or mute the thread https://github.com/notifications/unsubscribe-auth/ABBNCXR6GOFLNZSXQHTEAEDQLW2ULANCNFSM4I3C65UQ .
about this, with PGI I tried to increase the constant but that was not the problem, I think it is more tricky. I also tried to remove any code optimisation but was not working as well. I think somehow the two issues are connected because the results were related to some kind of miscalculation, compiler dependent, of PBCs.
I managed to check with nvc++
(formerly pgi). This compiler is heavily broken I am afraid. The problematic part is here. There is a bug in optimizing this loop that makes it so two of the three vectors in v
become identical and the call to reduce() at the next iteration fails because module2(b)
here is zero.
I tried to rewrite the loop like this and some tests are passed:
+ double mtrial;
+ Vector trial;
+ auto best=v[2];
+ auto mbest=modulo2(best);
for(int x1=x1min; x1<=x1max; x1++)
for(int x2=x2min; x2<=x2max; x2++) {
trial=v[2]+x2*v[1]+x1*v[0];
mtrial=modulo2(trial);
- if(first || mtrial<mbest) {
+ if(mtrial<mbest) {
mbest=mtrial;
best=trial;
- first=false;
}
}
Basically, instead of relying on the variable first
to make sure that best
is assigned at least once, I pre-assign it to v[2]
. This fixes some tests, but other tests still fail.
EDIT: specifically, with the code above the problem is that with some cells the best
vector at the end of the loop is identical to v[0]
, which is not possible for the specific vectors I feed. Adding a std::cerr<<trial
in the middle of the loop to check which vectors are tried makes the problem disappear, clearly pointing at some randomness in the result.
I don't have a solution, I am afraid this compiler is just not reliable enough.
A small improvement is to crash when the iterated vectors become NaN, instead of continuing forever. I will add this back to version 2.6.
Notice that, in addition to this (and probably unrelated), I see another bunch of failures: systematically in CVs using RMSD, plus a few others. I will check in case this is a signature of some undefined behavior in the RMSD code, but in any case given the point above this compiler will not be usable.
I tried to report this to nvidia, let's see if they can fix it.
@carlocamilloni I close this, see #711
I had problems with PGI and it was reported an issue with gcc 6 @GiovanniBussi how can we improve it?