Closed wsmorgan closed 5 years ago
I don't see this as a bug per se. The restrictions on the failsafe have been broadened, but as far as I know, this failsafe has never triggered. Was this change made because a bug was found that the failsafe failed to capture? (If so, the triggering bug should be fixed.) Or was it simply recognized that the failsafe could be strengthened?
While @JohnEdChristensen was running the latest version of the GRkgridgen code he ran into the following:
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x7f72f51087df in ???
#1 0x5637070a6804 in __rational_mathematics_MOD_smithnormalform_li
at /home/john/msg/temp/GRkgridgen/symlib/src/rational_mathematics.f90:93
#2 0x56370707fb44 in __kpointgeneration_MOD_symmetryreducekpointlist
at /home/john/msg/temp/GRkgridgen/kgridGen/src/kpointgeneration.f90:508
#3 0x56370708499d in __kpointgeneration_MOD_generateirredkpointlist
at /home/john/msg/temp/GRkgridgen/kgridGen/src/kpointgeneration.f90:194
#4 0x563706fe0462 in __grid_utils_MOD_grid_selection
at /home/john/msg/temp/GRkgridgen/src/grid_utils.f90:314
#5 0x5637070393f6 in __sp_hnfs_MOD_bct_6_7_15_18
at /home/john/msg/temp/GRkgridgen/src/sp_hnfs.f90:1600
#6 0x563707066330 in __find_kgrids_MOD_find_grid
at /home/john/msg/temp/GRkgridgen/src/find_kgrids.f90:411
#7 0x56370707e168 in lat_id_driver
at /home/john/msg/temp/GRkgridgen/src/driver.f90:25
#8 0x56370707ebb4 in main
at /home/john/msg/temp/GRkgridgen/src/driver.f90:2
[1] 6679 floating point exception (core dumped) kpoints.x
This occurred because in the process of converting an HNF into an SNF one of the diagonal of the matrix was set to zero during the operations. When the matrix then hit the check it attempted to take mod(4,0), i..e., it was dividing by zero, and caused a fault to trigger. So I see that as a bug but that's just semantics.
Important point is that without this change the code dies in some cases, with this check it does not.
This error was affecting quite a few of my test cases, they all pass with this fix.
But the failsafe is just catching a bug that is occurring elsewhere. It should never be the case that the SNF code returns a matrix that is not in SNF form. If the failsafe catches a case where something is not in SNF form, then the SNFer itself has a bug, or was given bad input and didn't catch it (also a bug, I would say).
The question we should ask is, why did the SNFer try to return a matrix that had zeros on the diagonal. That is wrong...
Gus Hart Professor, Physics and Astronomy http://msg.byu.edu Dean, Physical and Mathematical Sciences http://cpms.byu.edu
On Wed, Aug 21, 2019 at 10:59 AM John Christensen notifications@github.com wrote:
This error was affecting quite a few of my test cases, they all pass with this fix.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/msg-byu/symlib/pull/23?email_source=notifications&email_token=AB3UNGFKUBDSQWCMEXLD7JDQFVYAXA5CNFSM4IOBKUU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD42MH7Q#issuecomment-523551742, or mute the thread https://github.com/notifications/unsubscribe-auth/AB3UNGCJVAH26H3WP2ECXVLQFVYAXANCNFSM4IOBKUUQ .
This check is the exit condition for the loop that's converting the HNF into SNF, i.e., it's going to keep doing row/col operations until those conditions are true. It is expected that the matrix will not be SNF most of the time when these lines of code get run.
Changed:
to:
@JohnEdChristensen found this bug while testing GRkgridgen.