Computation time increase in hsflsh / sath by factor of ~500 with refprop 10 Fortran vs Refprop 9 Fortran

AdeL22222 commented 2 years ago

Description

Performence decay when switching to Refprop 10 Fortran Sources: Increase of computation time by factor of >100 Performance decay due to performanc issue in routine hsflash

Steps to Reproduce

Composition (mole fractions) oxygen 0.206480E+00 nitrogen 0.770311E+00 co2 0.324970E-03 water 0.138509E-01 argon 0.903313E-02

Default mixing, default reference conditions. Iterative evaluation with parameters within the Temperature range T=300K-450K, and pressure range p=100-400kPa (expansion in starting at ~450K, 3.5bar to ambient). List of calling arguments of the 31717 calls to hsflash can be provided, if required.

Expected behavior:

Performance comparable to Refprop 9: Reference profiling results, extract from callgraph

index % time    self  children    called     name
-----------------------------------------------
                0.00   29.33   31717/31717       <calling application> [16]
[17]    15.4    0.00   29.33   31717         hsflsh_ [17]
                0.20   27.58   31717/82658       abfl1_ [7]
                0.00    1.27   31717/31717       sath_ [46]
                0.01    0.22   31717/165323      therm_ [48]
                0.00    0.05   31717/921308      critp_ [44]
                0.00    0.01   31717/133706      limitx_ [94]
                0.00    0.01   31717/127312      limits_ [109]
                0.00    0.00   31717/145068122     ispure_ [55]
                0.00    1.27   31717/31717       hsflsh_ [17]
--------------------------------------------------------------
[46]     0.7    0.00    1.27   31717         sath_ [46]
                0.00    1.14      53/54          satt_ <cycle 2> [50]
                0.00    0.08   31751/718091      enthal_ [41]
                0.00    0.05   31718/921308      critp_ [44]
                0.00    0.01   31718/127312      limits_ [109]
                0.00    0.00   31717/145068122     ispure_ [55]
                0.00    0.00       2/50943       energy_ [83]
                0.00    0.00   31718/31718       maxt_ [315]
                0.00    0.00   31717/32449       errmsg_ [313]

Actual behavior: [What actually happens]

Profiling results of Refprop 10 Code (gprof), extract from callgraph.

As can be seen, the time spent in hsflsh increases (with the same number of calls) from 29.33s to 13348.5s. That is, computation time increases by a factor of ~500. Computation time primarily due to computation time increase of children, thereof primarily the calls to sath, though the number of calls of sath is constant. Amongst the children of sath, the drasitically increasesed number of calls to satt from 54 to 1649286 is striking. But also the number of calls to ispure within hsflsh increased from 1.4 Million to 25Million by a factor >10.

index % time    self  children    called     name
-----------------------------------------------
                0.03 13348.51   31717/31717       <calling application> _ [5]
[6]     60.7    0.03 13348.51   31717             hsflsh_ [6] = same total number of calls as in refprop 9
                0.18 13331.76   31717/31717       sath_ [7]
                0.02   16.35   31717/82658       abfl1_ [46]
                0.00    0.15   31717/6108549     therm_ [53]
                0.00    0.03   31717/1006501     satmax_ [106]
                0.00    0.00   31717/1731945     checklimits_ [150]
                0.00    0.00   31717/25370144693     ispure_ [41] = >1 order of magnitude more calls than in refprop 9
-----------------------------------------------
                0.18 13331.76   31717/31717       hsflsh_ [6]
[7]     60.6    0.18 13331.76   31717         sath_ [7] = same total number of calls as in refprop 9
               12.60 13316.96 1649284/1649286     satt_ <cycle 2> [11] => subcalls to satt increased from 54 to 1649286
                0.01    2.10  761208/761208      hescalc_ [99]
                0.00    0.06   63434/95168561     critp_ [43]
                0.01    0.01   63434/20563016     limits_ [80]
                0.01    0.00  412321/2138530272     rootsearch_ [52]
                0.00    0.00   63434/14327643     maxp_ [104]
                0.00    0.00   63434/14327643     maxt_ [119]
                0.00    0.00   63434/25370144693     ispure_ [41]
                0.00    0.00   31717/145423100     errnum_ [107]

Gprof explanation of table Each entry in this table consists of several lines. The line with the index number at the left hand margin lists the current function. The lines above it list the functions that called this function, and the lines below it list the functions this one called. This line lists: index A unique number given to each element of the table. Index numbers are sorted numerically. The index number is printed next to every function name so it is easier to look up where the function is in the table.

 % time This is the percentage of the `total' time that was spent
    in this function and its children.  Note that due to
    different viewpoints, functions excluded by options, etc,
    these numbers will NOT add up to 100%.

 self   This is the total amount of time spent in this function.

 children   This is the total amount of time propagated into this
    function by its children.

 called This is the number of times the function was called.
    If the function called itself recursively, the number
    only includes non-recursive calls, and is followed by
    a `+' and the number of recursive calls.

 name   The name of the current function.  The index number is
    printed after it.  If the function is a member of a
    cycle, the cycle number is printed between the
    function's name and the index number.

For the function's parents, the fields have the following meanings:

 self   This is the amount of time that was propagated directly
    from the function into this parent.

 children   This is the amount of time that was propagated from
    the function's children into this parent.

 called This is the number of times this parent called the
    function `/' the total number of times the function
    was called.  Recursive calls to the function are not
    included in the number after the `/'.

 name   This is the name of the parent.  The parent's index
    number is printed after it.  If the parent is a
    member of a cycle, the cycle number is printed between
    the name and the index number.

If the parents of the function cannot be determined, the word <spontaneous>' is printed in thename' field, and all the other fields are blank.

For the function's children, the fields have the following meanings:

 self   This is the amount of time that was propagated directly
    from the child into the function.

 children   This is the amount of time that was propagated from the
    child's children to the function.

 called This is the number of times the function called
    this child `/' the total number of times the child
    was called.  Recursive calls by the child are not
    listed in the number after the `/'.

 name   This is the name of the child.  The child's index
    number is printed after it.  If the child is a
    member of a cycle, the cycle number is printed
    between the name and the index number.

If there are any cycles (circles) in the call graph, there is an entry for the cycle-as-a-whole. This entry shows who called the cycle (as parents) and the members of the cycle (as children.) The `+' recursive calls entry shows the number of function calls that were internal to the cycle, and the calls entry for each member shows, for that member, how many times it was called from other members of the cycle.

Versions

REFPROP Version: Refprop 10 Fortran Sources vs Refprop 9 Fortran Sources Operating System and Version: Linux 3.10.0-1160.53.1.el7.x86_64 #1 SMP Fri Jan 14 13:59:45 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux Access Method:
Compilation of Fortran source code and linking against proprietary code using intel ifort2021.

Additional Information

If possible, please post examples and/or screenshots of the issue.

ianhbell commented 2 years ago

Can you please make a standalone example that demonstrates this issue? Otherwise it is difficult to know where to look.

AdeL22222 commented 2 years ago

I provide 2 test packages (created on Linux), each containing the sources + a test program and + a shell script that builds the test program using intel ifort compiler. At the end of the build, the test program runs 900 cycles of hsflsh, and the program is timed using the Linux system command time. On my machine, test_for_NIST_Refprop9 is timed real 0m5.639s user 0m5.578s sys 0m0.008s test_for_NIST_Refprop10 is timed real 7m2.423s user 7m1.366s sys 0m0.019s I provide 2 test packages (created on Linux), each containing the sources + a test program and + a shell script that builds the test program using intel ifort compiler. At the end of the build, the test program runs 900 cycles of hsflsh, and the program is timed using the Linux system command time. On my machine, test_for_NIST_Refprop9 is timed real 0m5.639s user 0m5.578s sys 0m0.008s test_for_NIST_Refprop10 is timed real 7m2.423s user 7m1.366s sys 0m0.019s test_refprop9.zip test_refprop10.zip

ianhbell commented 2 years ago

Have you tried to enable the saturation spline?

ianhbell commented 2 years ago

Are the numerical results different?

ianhbell commented 2 years ago

Also, it seems you are not doing a release build?

AdeL22222 commented 2 years ago

Hello Ian, a) I did not try saturation spline; I used a (fortran) design program that worked well when being liked against the objects of the the refprop 9 sources and exchanged the refprop fortran codes for the version 10 fortran files I was provided with. Since all funcitons required were available, I did not change the calling modes or sequence of functions being called. b) I did not explicitly compare the results. But the results must be almost the same: if a complext turbomachinery iteration with internal convergence limits of 1E-8 requires with both sources precisely the same number of calls, 37thousands and something, the differences must be of the order of 1E-08. c) What are you referring to with "not doing a release build"? I recieved the sources from my collegues responsible for the commercial side (license fees). To generate Linux object files, and for consitency with our internal software development environments. I converted dos-to-unix (line endings) and may have switched the case of the filenames. Otherwise, the content of the files is as provided to me. I understand that the sources I am using are not release sources (hence likely some intermediate developments status?), so I'll double back with my colleagues that they should contact you / NIST to be provided with the latest official fortran sources. Regards Armin

AdeL22222 commented 2 years ago

Calling satspln ahead of hsflsh does not significantly change performance. New timing result from 7m2sec down to 6m34sec, while Refprop 9 has 5sec. real 6m34.564s user 6m33.569s sys 0m0.008s

ianhbell commented 2 years ago

You are not compiling the source files with optimization enabled. Without that, timing comparisons are meaningless.

AdeL22222 commented 2 years ago

Agreed - though I cannot imagine that full optimization makes a factor of 100 in the one case and nothing in the other; but I'll let you know. Anyhow, my fortran code is Ver 10.0000 and this seems not to be the latest - windows executable refers to 10.0097. Since I am only the "application developer": what is the correct way to obtain the 10.0097 fortran sources from NIST?

ianhbell commented 2 years ago

First let's try 10.0 with proper optimization, although I agree that is not likely to be the culprit. If that causes trouble, then we can look into the development versions.

AdeL22222 commented 2 years ago

I ran optimization with options -O3 -ip ("-ip" for ifort = unlimited interprocedural optimization) Refprop 9 w/o prior satspln call (within the limitations of the "time" command, the satspln call did not change the execution time) real 0m5.171s user 0m5.108s sys 0m0.010s Refprop 10 with prior satspln call real 6m34.626s user 6m33.490s sys 0m0.129s

ianhbell commented 2 years ago

And do you get the same results, and number of failures with refprop 9 and 10?

AdeL22222 commented 2 years ago

No calling errors within the 900 test samples, maximum relative deviations, Abs(refprop10- refprop9)/refprop10, of the 900 samples by property

t p d q e cv cp w Max D_rel 6.0412E-05 2.2272E-04 1.5875E-04 0.0000E+00 5.9300E-05 9.8168E-05 8.6504E-05 4.8152E-05

ianhbell commented 2 years ago

Which is more correct in terms of the h(T,d) values after the HS iteration? I noticed that there were some issues with iteration tolerances in REFPROP 10.

AdeL22222 commented 2 years ago

Procedure: HSFLSH(h,s) => T,d, with these TDFLSH(T,d) => h1 Refprop 9: max relative enthalpy error (h1-h)/h within test samples 0.00000000993398 Refprop 10: max relative enthalpy error (h1-h)/h within test samples 0.00000000000982

ianhbell commented 2 years ago

Please note that email responses to the thread have the message history. Either remove the history or response in the web interface.

AdeL22222 commented 2 years ago

Using Refprop 10.0098 sources did not change the calculation behavior. Same precision, same calculation time.

foxdrm commented 1 year ago

I am experiencing a similar issue with very long computational time on HSFLSH function calls. I am calling REFPRP64.dll from within a C# application to get the properties of various hydrocarbon gas blends. The HSFLSH call is embedded within an iterative loop, so the runtime penalty is very annoying.

Observations For the same gas mixture and physical conditions, I measure the following runtimes: TPFLSH = 0.11 seconds PHFLSH = 0.08 seconds HSFLSH = 47.831 seconds

Steps to Reproduce Gas components (mol frac) = 0.001 H2O, 0.002 N2, 0.02 CO2, 0.925 C1, 0.035 C2, 0.006 C3, 0.003 C4I, 0.001 C4, 0.001 C5I, 0.001 C5, 0.001 C6, 0.002 C7, 0.001 C8, 0.001 C9 Pressure = 700 PSIA Temperature = 112 degF Enthalpy = 360.18 BTU/lbm Entropy = 1.045 BTU/lbm-R

System Info Version: Refprop 10 Operating System: Windows 11 Access Method: dll import into C# .NET application

ianhbell commented 1 year ago

Is there any way you can rewrite your code to not use HS calls? They will always be the most computationally expensive because the shape of curves in H-S coordinates do not lend themselves to easy iteration, and neither is an independent variable of the iteration, and as a result, nested iterations are required.

AdeL22222 commented 1 year ago

Unfortunately, avoiding HS calls is sometimes not possible: E.g. in turbomachinery analysis, there are performance modeling concepts that estimate performance loss in terms of entropy production at a given flow condition; that is, the innermost thermal iteration loop is based upon evaluating properties as a function of enthalpy and (correlation based iteratively changing) entropy. Also, in fluid mechanics, conversion from static and stagnation properties requires using the HS functionality - and particularly in turbomachinery applications, these conversions occur whenever changing the frame of reference (stationary blade row to rotating blade row), that is, at least in the 2nd to innermost iteration loop.

However, one could potentially provide an estimate for where the search of the HS-Flash function should start iterating (either stemming from the previous design iteration or some other considerations). Could this be an option to speed up the search? From my background of turbomachinery analysis, providing an optional argument / optional arguments for "initial" p and/or T for the HS search could be possible.

ianhbell commented 1 year ago

If you know your inputs are single-phase, you could do the iteration yourself with Newton-Raphson. You would just need the analytical Jacobian (trivial to generate with REFPROP via CoolProp because CoolProp implements all single-phase derivatives). And then you would start close to the right answer too, so the iterations should be reliable.

foxdrm commented 1 year ago

Ian, thanks for that suggestion, I will give it a try. My use case is turbomachinery analysis like AdeL mentioned. The slow HS calls are occurring only at specific pressure, temperature, gas mixture combinations, which led me to believe there was likely something happening in an unstable region of the iterations. Other times, the HS calls are hardly any slower than TP or PH calls.

ianhbell commented 1 year ago

I looked a bit into HS calls, in a slightly different context: https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=3228&context=iracc . Ultimately we plan to add something similar to REFPROP, but don't hold your breath.

Can you explain the context in which you have H,S inputs? Would help me motivate work along those lines.

AdeL22222 commented 1 year ago

Hello Ian,

turbomachinery analysis frequently converts between so-called static properties (conditions of the fluid in motion with some flow velocity) and so-called stagnation properties (fictitious state of zero flow velocity). That is, the energy content of the flow velocity is isentropically converted into enthalpy. Calculation is being done in both directions, both requiring (typically) evaluation of pressure as a function of enthalpy and entropy:

a. static => total:

determine entropy as function of static pressure and static enthalpy
determine stagnation enthalpy which is static enthalpy plus velocity squared divided by 2 (dynamic head divided by density)
determine stagnation pressure as function of stagnation enthalpy & entropy b. total => static:
determine entropy as function of stagnation pressure and stagnation enthalpy
determine static enthalpy which is stagnation enthalpy minus velocity squared divided by 2 (dynamic head divided by density)
determine static pressure as function of static enthalpy & entropy

Background for entropy-based turbomachinery performance analysis is e.g. a paper by J.D.D. Denton, "Loss Mechanisms in Turbomachinery" ASME Paper 93-GT-435. It describes aerodynamic losses in terms of entropy production. The iterative calculation methodology is then based on conservation of stagnation enthalpy in the stationary frame, and of stagnation rothalpy in the rotating frame, and the change of state is calculated by

a. calculating the entropy increase due to losses (e.g. during expansion across a turbine blade row or compression in a compressor blade row). b. calculating the resulting thermodynamics change of state across the blade row as a function of conserved stagnation enthalpy/rothalpy and "loss-increased" entropy. Hence every calculation of the exit conditions of the row starts with determination of exhaust properties (typically pressure) as function of conserved stagnation enthalpy/rothalpy and inlet entropy + loss-driven entropy increase. c. Static / stagnation property conversions (point 1) are required on inlet and exhaust side as well.

Kind regards

Armin

ianhbell commented 1 year ago

Thanks for your response (it is on the web interface too, since you responded to the email). It helps to motivate why we really have to get the H,S inputs working properly and efficiently

usnistgov / REFPROP-issues