Test case key-EXCITED fails on 32bit Intel (ia32/i386)

Describe the bug On Debian, the i386 (32bit Intel) architecture autobuilder fails the key-EXCITED test. Not sure whether you are interested in that on a practical level (32bit is surely dead for scientific research), but it might be an instability in the code base. Besides, quite a few tests show warnings of differing numerical results even on regular 64bit Intel.

The problem seems to be specific to i386, 32bit ARM e.g. seems to be fine.

build log: https://buildd.debian.org/status/fetch.php?pkg=mopac&arch=i386&ver=22.1.0-1&stamp=1702330784&raw=0

The error is as follows:

test 84
        Start  84: key-EXCITED

84: Test command: /usr/bin/python3 "/<<PKGBUILDDIR>>/tests/run_test.py" "/<<PKGBUILDDIR>>/tests/keywords" "/<<PKGBUILDDIR>>/obj-i686-linux-gnu/mopac" "0.01" "EXCITED.mop"
84: Working Directory: /<<PKGBUILDDIR>>/obj-i686-linux-gnu/tests/keywords
84: Test timeout computed to be: 10000000
84: 
84: 
84:           MOPAC Job: "EXCITED.mop" ended normally on Dec 11, 2023, at 21:39.
84: 
84: Traceback (most recent call last):
84:   File "/<<PKGBUILDDIR>>/tests/run_test.py", line 33, in <module>
84:     compare_mopac_out_file(out_line, out_list, ref_line, ref_list, float(argv[3]))
84:   File "/<<PKGBUILDDIR>>/tests/compare_output.py", line 216, in compare_mopac_out_file
84:     assert len(ref_list) == len(out_list), f'ERROR: output file size mismatch, {len(ref_list)} vs. {len(out_list)}'
84:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
84: AssertionError: ERROR: output file size mismatch, 1735 vs. 245
 79/124 Test  #84: key-EXCITED ......................***Failed    0.74 sec

Diffing the reference output against the test output shows these changes:

- CYCLE:    13 TIME:   0.039 TIME LEFT:  2.00D  GRAD.:   245.805 HEAT:  99.37512
- NUMERICAL PROBLEMS IN BRACKETING LAMDA  -504148458.072983     
-  -504148458072.983       -504148458.073487        440308904426.073     
-   499010958.743647     
+ CYCLE:    13 TIME:   0.113 TIME LEFT:  2.00D  GRAD.:   245.805 HEAT:  99.37436
+ NUMERICAL PROBLEMS IN BRACKETING LAMBDA  -504303705.34135485       -504303705341.35486       -504303705.34185916        440308928601.34131        499165490.99536222     
   GOING FOR FIXED STEP SIZE....
+ TOO MANY ITERATIONS IN LAMBDA BISECT  -504641247.27416450       -504641247.27416444       -504641247.27416444       -1.0767515046766464E-020  -2.7351938644416280E-021
[...]
+          TOO MANY ITERATIONS IN LAMBDA BISECT IN EF
[...]
+ ******************************************************************************
+ *                                                                            *
+ *     Error and normal termination messages reported in this calculation     *
  *                    *
+ * TOO MANY ITERATIONS IN LAMBDA BISECT IN EF                                 *
+ * TOO MANY ITERATIONS IN LAMBDA BISECT IN EF                                 *
  * JOB ENDED NORMALLY *

To Reproduce

Run the test suite on 32bit Intel, e.g. in a 32bit-chroot on an otherwise 64bit system.

Expected behavior

Tests suite runs fine.

Operating system

Debian unstable, i386 architecture

Thank you for pointing this out, I will make adjustments to this test for future MOPAC releases. Electronic structure software such as MOPAC contain many nonlinear optimization steps, which are fundamentally going to be erratic and chaotic in some cases and produce machine-dependent results because of slight differences in rounding and optimized order of operations. Well-written tests should be avoiding such behavior, but unfortunately this test seems to be near an unstable point of a geometry optimization that fails in different cycles on different machines (12 vs. 13). This indicates a problem with the test, not MOPAC itself.

About the numerical warnings, MOPAC was not written in a "numerically tight" way that enables results to be systematically refined to near double-precision accuracy. Rather, it is a mix of tunable accuracy parameters and hard-coded choices that produce a fundamental floor to the numerical precision, which varies based on the inherent stability of each calculation. I decided that the best way to handle this with testing was to set aggressive numerical tolerances in tests that fail a lot of the time, and to have these numerical failures flag a warning rather than an error. This is the simplest practical way to monitor MOPAC's precision floor. This does not signify any growing instability in MOPAC's code base, but rather an increasing level of scrutiny. Ideally, I'd like these warnings to motivate improvements to MOPAC's numerical precision, but that would be a substantial undertaking and just isn't a high priority right now.

openmopac / mopac

Test case key-EXCITED fails on 32bit Intel (ia32/i386) #195