ukaea / PROCESS

PROCESS is a systems code at UKAEA that calculates in a self-consistent manner the parameters of a fusion power plant with a specified performance, ensuring that its operating limits are not violated, and with the option to optimise to a given function of these parameters.
https://ukaea.github.io/PROCESS/
MIT License
35 stars 11 forks source link

Diverging results across different Operating Systems for Kallenbach IN.DAT #873

Closed jonmaddock closed 2 months ago

jonmaddock commented 5 years ago

In GitLab by @spicker on Jun 28, 2019, 08:31

Summary

It has been observed that running the Kallenbach input file on different platforms can produce slightly different results. The end result may be the same/similar but the path taken and the number of iterations differs.

Changing the version of gfortran on freia (4.8.5 <-> 7.2.0) has no effect.

From analysis of the output, it has been found that some of the constraint equations produce slightly different values for some of the variables. They are:

Constraint 8)

Following the chain of variables and functions with values that differ across platforms

wallmw <- pneutmw <- vol -< f(xvol) <- vin <- thetai

thetai = atan(kap/denomi)

The values of kap and denomi are identical on both platforms. It is the result of the intrinsic function "atan" that causes the difference.

Variable CygWin Linux GNU
kap 1.7806785714285713 1.7806785714285713
denomi 1.7806785714285713 1.7806785714285713
thetai 0.54748563700358688 0.54748563700358699

Unless a more accurate version of "atan" can be found, the only other solution would be to limit the accuracy to a smaller number of decimal places where the results are still identical.

Constraint 15)

Following the chain of variables and functions with values that differ across platforms

pdivt <- pradmw <- vol (see constraint 8)
      <- pohmmw <- pohmpv <- vol (see constraint 8)
      <- pohmmw <- vol (see constraint 8)
      <- pchargemw <- vol (see constraint 8)
      <- palpmw <- vol (see constraint 8)

The variable "vol" is causing problems again from its use of "thetai". (see constraint 8)

Constraint 16)

Following the chain of variables and functions with values that differ across platforms

pnetelmw <- pgrossmw <- pthermmw <- pthermshld <- pnucshld <- nuc_pow_dep_tot <- pftnuc
                                                           <- powfmw <- pchargemw (see constraint 8)
                                                           <- pneutmw (see constraint 8)
                                 <- pthermfw_blkt <- pthermfw <- pradfw
                                                              <- palpfwmw
                                                  <- pthermblkt <- pnucblkt

Multiple variables have a dependency on "vol". (see constraint 8)

Constraint 69)

Following the chain of variables and functions with values that differ across platforms

psep_kallenbach <- powerup <- power <- y in ode

pdivt (see constraint 15)

Constraint 70)

Following the chain of variables and functions with values that differ across platforms

teomp <- al (see Issue #844)

Putting in a temporary fix for "al" solved the divergence of "teomp"

Constraint 71)

Following the chain of variables and functions with values that differ across platforms

neomp <- al (see Issue #844)

Putting in a temporary fix for "al" solved the divergence of "neomp"

Steps to reproduce

Run the Kallenbach input file on Cygwin and Freia using the new script

@jmorris-uk @hlux

jonmaddock commented 5 years ago

In GitLab by @spicker on Jun 28, 2019, 15:48

Initial tests with an approximation to atan produces identical values across different platforms. E.g. Cygwin, Linux GNU Further tests show that sin(x) is also producing different results, similar to atan(x)

Using the approximation for atan(x) does reduce the differences in the variables derived from the volume which was calculated from atan(x). It doesn't resolve the major problem though of the divergent results. Hard wiring in the value of the result of atan instead of using the function itself only changes the results slightly and doesn't fix the divergence problem. There is obviously another major factor which initiates the divergence.

jonmaddock commented 5 years ago

In GitLab by @mkovari on Jun 28, 2019, 19:45

Can you explain why you want PROCESS to be compatible with Cygwin? A Windows user could, in theory, have several choices:

  1. Log on to freia.
  2. Run PROCESS in Windows using a suitable compiler.
  3. Use the Ubuntu option within Windows.
  4. Use Cygwin.

We certainly don't want to spend effort supporting all these options.
Comments @hlux @skahn @jmorris-uk ?

jonmaddock commented 5 years ago

In GitLab by @spicker on Jul 1, 2019, 07:29

Once the software is used more by external groups, we need to have tested it on a wider range of Operating Systems and compilers. This will help uptake of the software as well as provide a broader testing base to identify issues. Internally, I agree everyone can use Freia but it is wise to test the stability on other systems as this will highlight and remove systematic errors that may be present on an isolated system. Cygwin is free and well supported and provides a simple Linux looking platform.

jonmaddock commented 5 years ago

In GitLab by @hlux on Jul 2, 2019, 08:35

Dear Steven,

for your background: We have only fairly recently (2-3 years ago) allowed anyone to even use or take PROCESS off site and run it anywhere else but on freia. Before no one was even allowed to take port the code to any other platform. However, already when Manoj started in the group, there was a clear request that we should become more platform independent, which was made possible by gitlab being available outside and was important for UKAEA staff to take a PROCESS copy on their laptops when travelling (which where typically not linux). In practice, we only have relatively few external users (currently sitting in Greifswald, Garching, Durham and Liverpool, plus possibly soon Princenton) and have not any complaints from any of them yet about problems with platforms not being supported or compilers not being supported. So for me the main purpose of the tests you have done was to familiarise yourself with the code, make it more robust in general and most importantly figure out why PROCESS (with Kallenbach on) has a different number of iterations on different machines. We are still not planning to generically support a large number of compilers and operating systems, as we are not in that kind of business yet. However, using alternative compilers that are more strigent in error detection can of course make the code more robust in general and this was a helpful exercise while familiarising you with the code. In general we do not have the manpower to support that many operating systems and compiler options and therefore should not strive to do so.

I hope this clarifies the situation.

Best regards,

Hanni

jonmaddock commented 5 years ago

In GitLab by @spicker on Jul 19, 2019, 10:54

Revisiting the atan problem. Out of the 24 constraints in the Kallenbach input file, 6 shows signs of divergence between runs under Cygwin and Linux GNU. Analysis showed that most of the power related variables have a dependence on the volume. This was in turn traced back to thetai.

In subroutine xparam in plasma_geometry.f90

t = 1.0D0 - tri
denomi = (kap**2 - t**2)/(2.0D0*t)
thetai = atan(kap/denomi)

The value of thetai varies in approximately the 16th decimal place onwards in the diverging runs. This sounds quite small but it has a significant impact on many other variables. Replacing the call to atan and using approximation instead to atan produced identical output values for the identical input values but the accuracy of the approximation leaves a lot to be desired.

The next approach was to limit the numerical accuracy of the results of the call to atan such that the results were the same up to a certain level of accuracy but not to the point where the results diverge.

integer*8 index

t = 1.0D0 - tri
denomi = (kap**2 - t**2)/(2.0D0*t)
thetai = atan(kap/denomi)

index=int(thetai*1d8)
thetai=index*1d-8

This simple piece of code truncates the fractional part of the number to a certain number of decimal places. Two numbers are practically the same except for 15th decimal place onwards.

1.345678901234567890
1.345678901234567875

Becomes

1.345678900000000000
1.345678900000000000

This removes the divergence in many of the variables altogether.

jonmaddock commented 5 years ago

In GitLab by @skahn on Jul 19, 2019, 11:10

Thanks for this nice finding !

How many decimal you can push this rounding up, keeping the corrected behavior ?

Cheers

jonmaddock commented 5 years ago

In GitLab by @spicker on Jul 19, 2019, 11:31

The upper limit is probably just before the two numbers diverge, whatever decimal place that is for that situation. I did notice when using the approximation to atan that the results were more unstable probably because some of the approximations were so bad. Probably keeping the numbers accurate to the 12th decimal place is more than enough, I doubt it would have that much impact on the actual results. Anything less than 6 decimal places will start to produce inaccuracies in other parts of the code as you aren't really using all of the available accuracy of the floating point number.

jonmaddock commented 5 years ago

In GitLab by @skahn on Jul 19, 2019, 12:45

Thanks steven As atan is called in a loop in the Kallenbach model, the effect of larger order decimal might have an effect.

Have you try vary the decimal order of the rounding in PROCESS and see the effects on final result involving the Kallenbach model?

Cheers

Seb

jonmaddock commented 2 months ago

As the Kallenbach model has been removed (#1886), it is unlikely that this can be reproduced. Regarding the wider issue of differing results in different environments, we are aware that some results do differ in certain environments. Most users/developers within the group use Ubuntu/WSL to run PROCESS, and for those that don't have this ability, we provide a Docker container (docker/dev/Dockerfile) to aid reproducibility. This is sufficient to reproduce identical results in different environments. Any future specific differences will be looked at individually.