compiler/optimizer bug in gfortran 9.4.0 for get_radial_expansion_coefficients_poly3

bernstei commented 1 year ago

There appears to be an optimizer bug in gfortran 9.4.0 where the values for radial_exp_coeff that soap_turbo.f90::get_soap gets back from get_radial_expansion_coefficients_poly3 are NaN. Attempts to print exp_coeff inside get_radial_expansion_coefficients_poly3 lead to apparently correct (not NaN, anyway) values being returned, which is why I think it's an optimizer bug.

Reducing the optimization level from -O2 to -O1 fixes it, but I don't want to do that. The only other way I've found to deal with this even vaguely cleanly is to make the compiler think that it might have to print exp_coeff. I can do that without having to change any other is is by making the if (.false.) then section at the very end, which optionally writes exp_coeff, dependent on a real variable, my_debug, and make that variable depend on a new optional input argument, debug, but default to .false.. This makes the compiler think (at compile time) that it might be printed, which disables whatever optimization breaks things. Even if that section is never actually run because debug = .true. is never passed in, it's enough to fix the optimization issue.

If you think this is worth doing, I can paste a patch here or open a PR.

bernstei commented 1 year ago

patch soap_turbo_radial_gnu_optimizer_bug_patch.txt

mcaroba commented 1 year ago

Nicely spotted @bernstei, thanks! I guess the bug could also be triggered in the printing of the singular values in the same routine. I will patch it ASAP (I'm in the middle of a job interview atm!). I think I will not make the debug variable an argument of the subroutine but just an internal variable, and apply it also to the SVD printing, just in case.

bernstei commented 1 year ago

I think if it's just an internal variable with a value fixed at compile time the compiler knows it's false and doesn't fix the issue, but I'll check.

bernstei commented 1 year ago

I checked, and I cannot get it to work with any variable that's purely internal with a value that's known at compile time. I think the compiler is clever enough to tell that the printing section will never run. If you have a specific patch you'd like me to try, let me know.

mcaroba commented 1 year ago

Yes, that makes sense. OK, I'll follow what you proposed initially, but could you check what happens if you put this:

      exp_coeff_temp1 = 0.d0
      exp_coeff_temp2 = 0.d0
      exp_coeff_der_temp = 0.d0

before the if immediately preceding it?

bernstei commented 1 year ago

I don't understand which if. What do you mean by "preceding it" ? Which "it"? Right after they're allocated?

mcaroba commented 1 year ago

Sorry for being so imprecise, I meant this:

    exp_coeff_temp1 = 0.d0
    exp_coeff_temp2 = 0.d0
    exp_coeff_der_temp = 0.d0
    if( rjs_in(k) < rcut_hard_in .and. mask(k) )then

instead of this:

    if( rjs_in(k) < rcut_hard_in .and. mask(k) )then
      exp_coeff_temp1 = 0.d0
      exp_coeff_temp2 = 0.d0
      exp_coeff_der_temp = 0.d0

bernstei commented 1 year ago

That does seem to be sufficient. I don't understand why, though, and I'm a little nervous given how random these optimizer bugs often seem to be - it's hard to tell what will or won't turn off the broken optimization. Did you have a specific reason to suggest this change of where those arrays are zeroed?

BTW, more evidence that this optimizer issue is subtle - changing from -O2 to -O3 also fixes the problem.

mcaroba commented 1 year ago

These variables were not initialized otherwise for some cases. Can I suggest a last thing to try (sorry, I am in a place where I can't test this myself)? What happens if you change the < in the condition to <= instead (leave the original code unchanged otherwise)? If that also fixes the issue, then I think it could be a code bug triggered when atoms are right at the cutoff and an uninitialized variable contains a combination of bits corresponding to a non-finite value (a NaN?). [if the value is finite, then the coefficients will go to zero at the cutoff, since 0 times a finite number is 0; if the value is undefined, then 0 times undefined is undefined - I think this could be happening]

bernstei commented 1 year ago

No - changing the < to <= in the same if statement still results in NaNs.

mcaroba commented 1 year ago

@bernstei sorry I left this hanging for so long. Could you please check if the latest commit fixes the issue?

bernstei commented 1 year ago

Ugh - I have no idea where I put my example inputs. Let me see if I can find that.

bernstei commented 1 year ago

Found them, testing it now (not trivial because LAMMPS wants to download a fresh copy tied to a specific git commit).

mcaroba commented 1 year ago

Great, thanks. If it's not fixed, please send me the files so that I can do the testing for potential fixes myself without having to bother you.

bernstei commented 1 year ago

No, the latest commit doesn't fix the problem. So far I've been unable to reproduce it except in LAMMPS, but there's one more thing I want to try.

mcaroba commented 1 year ago

@bernstei I have been doing some testing on this. It looks like the behavior of LAPACK routines used for the basis orthonormalization depends on the level of optimization. This is one of the W matrix coefficients (W = S^-1/2):

carom1@t410-du-21-01~$ gfortran -c module.f90 carom1@t410-du-21-01~$ gfortran -o calc.ex calc.f90 module.o -llapack -lblas carom1@t410-du-21-01~$ ./calc.ex | grep "alpha_max = 8" -B1 -485.62295892374391
alpha_max = 8 carom1@t410-du-21-01~$ gfortran -c module.f90 -O2 carom1@t410-du-21-01~$ gfortran -o calc.ex -O2 calc.f90 module.o -llapack -lblas carom1@t410-du-21-01~$ ./calc.ex | grep "alpha_max = 8" -B1 -485.62295892808970
alpha_max = 8

There seems to be a precision issue (internal truncation to single precision?) triggered by the optimization level. In my tests, O2 and O3 lead to the same values, which differ from the value w/o optimization flag. I wonder if this is at the root of the issue you're observing. The results of the GAP calculation can be quite sensitive to the value of these coefficients. I'll keep digging, maybe I can force the right behavior.

bernstei commented 1 year ago

That's basically not possible, as far as I understand, because optimization shouldn't affect linking, just compilation. I think the change must be in what's getting passed to the lapack routines. I can take a look if you can provide me with module.f90 and calc.f90.

mcaroba commented 1 year ago

Those codes are minimal interfaces to use the soap_turbo basis construction routine. I traced the issue back to the use of matmul() introducing a minimal difference in some intermediate matrix value. When I replace matmul() with dgemm() I get the same results regardless of optimization (and they coincided with the optimized matmul() results). I will make this replacement in the code, but I don't think this could be triggering the issue you experienced. I keep looking...

bernstei commented 1 year ago

@mcaroba any progress with this, out of curiosity? I'm recompiling LAMMPS, and noticed that I still have my patches floating around the copy of the source inside the LAMMPS build dir.

mcaroba commented 1 year ago

@bernstei Actually, this week one of my colleagues who's porting this to GPU reported opt flag dependent NaNs (he said only -O2 triggers it) and we're taking a deep look together probably on Friday. I'll let you know!

bernstei commented 1 year ago

Let me know if there's anything I can do to help.

mcaroba commented 1 year ago

@bernstei I think my colleague may have found the bug, but please check if it goes away when you add this pref_f initialization on top of the block we were discussing about:

  pref_f = 0.d0
  exp_coeff_temp1 = 0.d0
  exp_coeff_temp2 = 0.d0
  exp_coeff_der_temp = 0.d0

bernstei commented 1 year ago

Let me see if that fixes mine as well. Can you remind me exactly what file and line you mean?

mcaroba commented 1 year ago

In soap_turbo_radial.f90 (make sure you have the current version in the master branch), add pref_f = 0.d0 to lines 179 and 511. Please let me know if that fixes the issue for you, I'm very keen to get this sorted!

bernstei commented 1 year ago

Is there any particular reason to think this is a systematic fix, or is it just luck with some optimizer bug? At least for line 179, I don't see any evidence setting pred_f to 0 there should make a difference - from what I can tell its value is already set before it's used. Am I missing something with the flow of the code?

mcaroba commented 1 year ago

pref_f is set in line 287, but there is a condition in line 281 that does not always get triggered. When it doesn't, pref_f is not set and exp_coeff_temp2 is set to 0.d0. Later, in line 343 you have the term pref_f * exp_coeff_temp2 which is always computed. When the condition in 281 is not met, there's no guarantee that pref_f is a number. If it's NaN, even if exp_coeff_temp2 = 0.d0, the product pref_f * exp_coeff_temp2 is undefined. I suppose that different optimization levels affect how the compiler handles this situation.

So did it fix the bug for you?

bernstei commented 1 year ago

Aaah - that makes sense. I haven't had a chance to check yet - on vacation, so it might take another week.

bernstei commented 1 year ago

The two pref_f = 0 lines definitely fixes the immediate problem.

mcaroba commented 1 year ago

@bernstei thanks a lot for your help debugging this extremely opaque issue and your patience while we figured this out. I just pushed the changes to the soap_turbo library and updated the submodule on the GAP repo. Thanks again!

libAtoms / soap_turbo

compiler/optimizer bug in gfortran 9.4.0 for get_radial_expansion_coefficients_poly3 #3