SIGFPE because of an accumulation of last value in DenseMatrix in ChemEquil

GoogleCodeExporter commented 9 years ago

For certain initial conditions, the ChemEquil routine would result in
a SIGFPE error. I think I have found the cause of the problem and I would like 
to propose a possible fix for it.

The working arrays "jac", "x" and "res_trial" are defined with size "nvar"
which is "m_mm + 1", i.e. number of elements + 1.

    DenseMatrix jac(nvar, nvar);       // jacobian
    vector_fp x(nvar, -102.0);         // solution vector
    vector_fp res_trial(nvar, 0.0);    // residual

The last position of the array "x" is used to store the logarithm of the 
temperature

x[m_mm] = log(s.temperature());

This value is then used in the routines "equilResiduals" and "equilJacobian" as 
follows:

    doublereal temp = exp(x[m_mm]);
    setToEquilState(s, x, temp);

The equilResiduals routine will return the residual array "res_trial" of size 
"nvar = m_mm + 1",
with the last value containing the logarithm of the temperature, and the array 
is then passed to the function "leftMult":

    jac.leftMult(DATA_PTR(res_trial), DATA_PTR(grad));

where the entire array size is used. The last value containing the logarithm of 
temperature can lead to floating point overflow.

A possible solution is to reduce the size of the working arrays "jac", "x" and 
"res_trial" to "m_mm", i.e. the number of elements, and pass the logarithm of T 
as a separate double value.

This worked in my application.

The issue is found in both 1.8 and 2.0.2. (Platform: Suse Linux 11.1, x64) 

Might I ask you to confirm whether this is a problem and if it is, incorporate 
the fix?

Thanks

Francesco

Original issue reported on code.google.com by shelli...@gmail.com on 4 Sep 2013 at 10:54

GoogleCodeExporter commented 9 years ago

Upon some consideration, the above removes temperature update. So I restored 
the size of the arrays to be "nvar = m_mm + 1".
On the other hand - I noticed that the call to the function "leftMult" returns 
the array "grad" which is not used anywhere. Therefore the line 855 in 
ChemEquil.cpp can
be safely removed. This fixes the overflow. 

An alternative option would be to redefine "grad" to be of size "m_mm", define 
another local array of size "m_mm" and copy a portion of "res_trial" to
the local array and then pass it to "leftMult".

Francesco

Original comment by shel...@gmail.com on 5 Sep 2013 at 10:46

GoogleCodeExporter commented 9 years ago

You're right that the "grad" variable is never actually used, so the call to 
jac.leftMult could simply be removed. However, I'm not sure how the presence of 
log(T) could cause an overflow, as I would expect that to always be a fairly 
normal value.

Can you provide an example where this floating point error occurs?

Original comment by yarmond on 6 Sep 2013 at 12:07

Changed state: Feedback

GoogleCodeExporter commented 9 years ago

OK, the overflow was not caused by log(T). It was basically, divergence caused 
by the numerical procedure used to calculate the jacobian in the routine 
"equilJacobian" (in ChemEquil.cpp).

for (n = 0; n < len; n++) {
      xsave = x[n];
      dx = atol;
      dx2 = fabs(xsave) * 1.0E-8; //reduced from 1.0e-7 to 1.0e-8 to improve stability
      if (dx2 > dx) dx = dx2;
      x[n] = xsave + dx;
      dx = x[n] - xsave;
      rdx = 1.0/dx;

      // calculate perturbed residual

      equilResidual(s, x, elmols, r1, xval, yval, loglevel-1);

      // compute nth column of Jacobian

      for (m = 0; m < len; m++) {
    jac(m, n) = (r1[m] - r0[m])*rdx;
      }
      x[n] = xsave;
    }
In the code above, I reduced the constant 1.0E-7 to 1.0E-8, which is used to 
vary the solution x[n]. This change removes the divergence.

Previous change of removing the call to leftMul simply moved the divergence to 
a different set of parameters further down the line in the simulation. (note I 
have also removed leftMul after that, but just to speed up the simulation as it 
is not needed.)

Original comment by shelli...@gmail.com on 26 Sep 2013 at 10:32

GoogleCodeExporter commented 9 years ago

r2550 and r2551 fix all of the cases that I've see where floating point 
exceptions occurred. Can you try the most recent version (on the 2.1 branch) to 
see if that resolves your issue?

If not, can you provide an example where the proposed change eliminates a 
convergence failure or FPE? I haven't found any cases where it makes any 
difference.

Original comment by yarmond on 30 Sep 2013 at 2:22

GoogleCodeExporter commented 9 years ago

Original comment by yarmond on 12 Nov 2013 at 9:03

Changed state: Fixed

jcaiuwyo / cantera

SIGFPE because of an accumulation of last value in DenseMatrix in ChemEquil #172