wacl-york / mcm-web

Code for the MCM web application
1 stars 2 forks source link

Export mechanisms to KPP format #210

Closed stulacy closed 9 months ago

stulacy commented 9 months ago

This is a much requested feature that is currently under development (#177 ) Initially it will aim to reproduce the functionality as it was on the old site, before improving it based on user feedback.

stulacy commented 9 months ago

A KPP export is implemented that syntatically looks valid compared with the previous website's version, although I have been unable to test it in a model yet. When combining identical reactions it assumes they are ordered the same way, i.e. that two reactions of NO2 and O3 that produce OH and CHO will both be written NO2 + O3 = OH + CHO, rather than one as NO2 + O3 = CHO + OH and the other O3 + NO2 = OH + CHO. While this looks to be valid for the reactions in the database, there is no guarantee of it in the way the database is stored. Therefore this check for duplicate reactions should reorder the reactants and products.

Users have also reported that additional modifications needed to be made the KPP to make it fully parseable. I'll need to identify what these modifications are and implement them in the web-app.

stulacy commented 9 months ago

The check for identical reactions now handles reactions ordered differently.

stulacy commented 9 months ago

Hi @RolfSander and @yantosca, this PR adds a KPP export to the new MCM website. I've started by recreating the functionality present in the old site and am now implementing the additional tweaks mentioned in the issue on the KPP repo, which I've summarised into the list below. As I'm not a end-user of the MCM, nor have I any familiarity with KPP, there's a few aspects I'd welcome some clarity on.

a: Examples below, are there any other reactions like this?

{2.}     O + O3 = :     8.0D-12*EXP(-2060/TEMP)     ;    # should be O + O3 = 2O2
{19.}    OH + HO2 = :   4.8D-11*EXP(250/TEMP)   ;   # should be OH + HO2 = H2O + O2

b: Bob asked "should these be added to the #DEFVAR section (esp. O2 and H2O), because KPP won't parse it otherwise?"

#INLINE F90_GLOBAL 
 REAL(dp)::M, N2, O2, RO2, H2O 
#ENDINLINE {above lines go into MODULE KPP_ROOT_Global}

c: Done

d: Done

e: KMT06 is the only complex rate that has H20. Should the same be done for Reactions that involve it, e.g. MGLYOO = H2O2 + MGLYOX : 6.0D-18*H2O?

f: How should hv be included?

g: I can remove this, although I can see that KPP 3 only came out a year ago so I imagine there are still users of KPP 2 who would welcome having backwards compatibility here. Are there are any other differences from KPP 2 to 3 that would affect these mechanism files?

RolfSander commented 9 months ago

Hello @stulacy, it's great to see that you are working on the KPP output of the MCM :-)

I have embedded my comments in the text below.

Could you provide an example of a KPP file exported from the new MCM website? I'd like to test it. Especially for questions b) and c), I need to check if my comments are still valid for the new code.

a) Reactions with no products

a: Examples below, are there any other reactions like this?

{2.} O + O3 = : 8.0D-12EXP(-2060/TEMP) ; # should be O + O3 = 2O2 {19.} OH + HO2 = : 4.8D-11EXP(250/TEMP) ; # should be OH + HO2 = H2O + O2

When the products of a reaction are not known or not important, the dummy species PROD should be used as a product. This is necessary because the KPP syntax does not allow an empty list of products. See also:

https://kpp.readthedocs.io/en/stable/using_kpp/04_input_for_kpp.html#equations

Equations 2 and 19 will then look like this:

{2.}   O + O3 = PROD :    8.0D-12*EXP(-2060/TEMP) ; # should be O + O3 = 2O2
{19.}  OH + HO2 = PROD :  4.8D-11*EXP(250/TEMP)   ; # should be OH + HO2 = H2O + O2

Currently, I don't see any other reactions without products.

b) Inlining species

b: Bob asked "should these be added to the #DEFVAR section (esp. O2 and H2O), because KPP won't parse it otherwise?"

INLINE F90_GLOBAL

REAL(dp)::M, N2, O2, RO2, H2O

ENDINLINE {above lines go into MODULE KPP_ROOT_Global}

Yes, this should probably be added.

c) Removing "USE constants" and "CALL mcm_constants" c: I'm happy to remove these if they are unused

Yes, these can probably be removed.

d) negative exponents should be in parentheses d: Done

Great, thanks.

e) In the definition of KMT06, the term "H2O" must be replaced by "C(ind_H2O)".

e: KMT06 is the only complex rate that has H20. Should the same be done for Reactions that involve it, e.g. MGLYOO = H2O2 + MGLYOX : 6.0D-18*H2O?

Yes, this applies to all rate constants which depend on the concentration of water (i.e., on humidity).

f) Photolysis reactions don't include "hv" as a reagent.

f: How should hv be included?

You can add "hv" as a pseudo-reactant, for example:

{36.} O3 + hv = O1D : J(1) ;

g) When upgrading to KPP-3.0.0, the line #INCLUDE atoms must be changed to #INCLUDE atoms.kpp.

g: I can remove this, although I can see that KPP 3 only came out a year ago so I imagine there are still users of KPP 2 who would welcome having backwards compatibility here.

Good point, backwards compatibility is important. I need to discuss this with @yantosca. Maybe we can add a link from "atoms" to "atoms.kpp" in our next release. Then the INCLUDE command still works even without switching to the new syntax.

Are there are any other differences from KPP 2 to 3 that would affect these mechanism files?

We provide upgrading information on our web page here:

https://kpp.readthedocs.io/en/stable/getting_started/00_revision_history.html#kpp-3-0-0

AFAICS, the other items listed there do not affect the MCM-generated mechanism.

RolfSander commented 9 months ago

Hello,

I have 3 more ideas on my wish list. Nothing essential, but nice-to-have features...

h) MCM provides equation numbers as pure comments, e.g., {19.}, which is ignored by KPP. However, if you'd change the syntax to <19>, the information about the equation number will be available in the Fortran code, e.g.:

<19> OH + HO2 = PROD : 4.8D-11*EXP(250/TEMP) ;

For details, see:

https://kpp.readthedocs.io/en/stable/using_kpp/05_output_from_kpp.html#root-monitor

https://kpp.readthedocs.io/en/stable/using_kpp/04_input_for_kpp.html#eqntags

i) To indicate double precision, Fortran numbers can be expressed with D instead of E, e.g.: 4.8D-11. However, in modern Fortran it seems to be more common to express precision with SELECTED_REAL_KIND(...). Therefore, I'd prefer the generic E in the exponential representation of constants, not the D:

<19> OH + HO2 = PROD : 4.8E-11*EXP(250/TEMP) ;

j) I'm not a 100% sure about this but I think that you can lose precision when you mix integer and real values in Fortran, as in (250/TEMP). It may be better to define the temperature dependence as a real value by adding a dot at the end: (250./TEMP).

stulacy commented 9 months ago

Thanks for getting back to me so promptly @RolfSander I've attached a mechanism output from Isoprene C5H8 (had to change file extension to .txt for GitHub) with the following changes:

RolfSander commented 9 months ago

Thanks for the fast implementation!

a) Reactions with no products - Added the explicit products you suggested

I've seen that you've added def add_missing_products(rxns) to kpp.rb on github. In the attached file, however, there are still no products in eqns <2> and <19> :-(

Also, I'm a bit worried about side effects. When you apply the substitution

gsub('O \+ O3 = ',...

to equation

<2> O + O3 =...

wouldn't that affect equation

<7> NO + O3 =...

as well?

As an alternative, it may work to replace the regexp

= *:

with

= PROD :

b) Inlining species - Let me know what you decide

c) Removing "USE constants" and "CALL mcm_constants" - I've removed these from the export as it was straight forward

Items b) and c) are a bit tricky. We need to define several constants. For example, let's look at kmt01. It is currently already defined in the F90_RCONST section of your isoprene file. In addition, it is necessary to declare it in the F90_GLOBAL section:

#INLINE F90_GLOBAL
  REAL(dp) :: kmt01
#ENDINLINE {above lines go into MODULE KPP_ROOT_Global}

This is not implemented yet. As an alternative to F90_RCONST and F90_GLOBAL, we could move all declarations and definitions into a separate file called constants.f90. In that case, it would be necessary to reactivate USE constants and CALL mcm_constants.

I'd also like to hear @yantosca's opinion here...

d) negative exponents should be in parentheses

Thanks, this looks good now.

e) In the definition of KMT06, the term "H2O" must be replaced by "C(ind_H2O)" - Done for all references of H2O in rates

Thanks, this looks good now.

f) Photolysis reactions don't include "hv" as a reagent. - Done

Thanks, this looks good now.

g) When upgrading to KPP-3.0.0, the line #INCLUDE atoms must be changed #INCLUDE atoms.kpp. - Let me know what you decide about this

For now, I suggest that you leave the #INCLUDE atoms command as it is, and I will discuss this with @yantosca later.

h) Adding equation numbers rather than comments - Done

Thanks, this looks good now.

i) Replacing D with E for exponents - Done, although there might be a few edge cases that have snuck past

Thanks, this looks good now. I checked your isoprene file, and I didn't see any leftover "D"s.

j) Explicitly casting int constants as float, e.g. 250/TEMP -> 250./TEMP - I'm not that familiar with Fortran so I've asked a colleague who is and they don't believe that this is the case. Are you able to quickly test this?

If the mix only occurs in the form of something like 250/TEMP, then it is probably okay. However, with slightly more complex expressions, it's easy to fall into the trap. Try for example these fortran lines:

  ! calculate 50% of 123.456
  PRINT *, (50/100)   * 123.456
  PRINT *, (50./100.) * 123.456

Using gfortran, I get the results 0.0 and 61.728, respectively.

RolfSander commented 9 months ago

Thanks for the fast implementation!

a) Reactions with no products - Added the explicit products you suggested

I've seen that you've added def add_missing_products(rxns) to kpp.rb on github. In the attached file, however, there are still no products in eqns <2> and <19> :-(

Also, I'm a bit worried about side effects. When you apply the substitution

gsub('O \+ O3 = ',...

to equation

<2> O + O3 =...

wouldn't that affect equation

<7> NO + O3 =...

as well?

As an alternative, it may work to replace the regexp

= *:

with

= PROD :

b) Inlining species - Let me know what you decide

c) Removing "USE constants" and "CALL mcm_constants" - I've removed these from the export as it was straight forward

Items b) and c) are a bit tricky. We need to define several constants. For example, let's look at kmt01. It is currently already defined in the F90_RCONST section of your isoprene file. In addition, it is necessary to declare it in the F90_GLOBAL section:

#INLINE F90_GLOBAL
  REAL(dp) :: kmt01
#ENDINLINE {above lines go into MODULE KPP_ROOT_Global}

This is not implemented yet. As an alternative to F90_RCONST and F90_GLOBAL, we could move all declarations and definitions into a separate file called constants.f90. In that case, it would be necessary to reactivate USE constants and CALL mcm_constants.

I'd also like to hear @yantosca's opinion here...

d) negative exponents should be in parentheses

Thanks, this looks good now.

e) In the definition of KMT06, the term "H2O" must be replaced by "C(ind_H2O)" - Done for all references of H2O in rates

Thanks, this looks good now.

f) Photolysis reactions don't include "hv" as a reagent. - Done

Thanks, this looks good now.

g) When upgrading to KPP-3.0.0, the line #INCLUDE atoms must be changed #INCLUDE atoms.kpp. - Let me know what you decide about this

For now, I suggest that you leave the #INCLUDE atoms command as it is, and I will discuss this with @yantosca later.

h) Adding equation numbers rather than comments - Done

Thanks, this looks good now.

i) Replacing D with E for exponents - Done, although there might be a few edge cases that have snuck past

Thanks, this looks good now. I checked your isoprene file, and I didn't see any leftover "D"s.

j) Explicitly casting int constants as float, e.g. 250/TEMP -> 250./TEMP - I'm not that familiar with Fortran so I've asked a colleague who is and they don't believe that this is the case. Are you able to quickly test this?

If the mix only occurs in the form of something like 250/TEMP, then it is probably okay. However, with slightly more complex expressions, it's easy to fall into the trap. Try for example these fortran lines:

  ! calculate 50% of 123.456
  PRINT *, (50/100)   * 123.456
  PRINT *, (50./100.) * 123.456

Using gfortran, I get the results 0.0 and 61.728, respectively.

stulacy commented 9 months ago

Great! So the list of unresolved changes is now:

I don't have much to input for these 3 yet as I'm still getting to grips with KPP:

RolfSander commented 9 months ago

a) Reactions with no products - the linter decided to be helpful and remove the regex slashes, causing it to fail hence the empty products in the first file

Indeed, sometimes programs think that they are really, really clever...

I've uploaded a corrected mechanism file with this change and only looking for exact matches so it won't affect eq 7 as you pointed out.

Thanks, the equations look fine now. However, when you mention the species O2 and H20 in equations <2> and <19> explicitly, it will also be necessary to declare them in the #DEFVAR section:

O2 = IGNORE ;
H2O = IGNORE ;

From your perspective, is it more useful to have these explicit products, or the dummy PROD species?

If the products are known, it is usually best to include them explicitly. However, for species like O2 and H2O it doesn't really matter much. Note that there are other reactions where O2 is missing on the right hand side, e.g., equation <4> NO2 + O = NO. Here, it is okay for KPP because the number of products is still >= 1, even without O2.

The simplest solution would be to put PROD into equations <2> and

<19>. Then there would be no need to extend the `#DEFVAR` section. For me, both solutions are fine, either `PROD` or explicit products. > j) Explicitly casting int constants as float - I think your example > shows integer division, as running 5 * 123.456 vs 5. * 123.456 > gives the same answer. However, I'm happy to make this change as it > is more explicit. Thanks, I think we're on the safe side when these numbers are expressed as REAL, not INTEGER. > I don't have much to input for these 3 yet as I'm still getting to > grips with KPP: > b) Inlining species > c) Removing "USE constants" and "CALL mcm_constants" Items b) and c) are connected and they are indeed the major open tasks. Please give me a bit more time to find a good solution for this. Also related is the definition of photolysis rate constants. The MCM needs complex rate constants (https://mcm.york.ac.uk/MCM/rates/complex) as well as photolysis rate constants (https://mcm.york.ac.uk/MCM/rates/photolysis). You have already included the complex rate constants into the KPP file (`complex_rates_out` in `kpp.rb`). For KPP, it would be necessary to have the photolysis rate constants in a similar way. BTW: I was surprised to see that the facsimile output doesn't have any definitions of the photolysis rate constants either. How does facsimile get these values? > g) When upgrading to KPP-3.0.0, the line #INCLUDE atoms must be > changed #INCLUDE atoms.kpp Don't worry about g), I'll find a solution together with @yantosca.
stulacy commented 9 months ago

The simplest solution would be to put PROD into equations <2> and

<19>. Then there would be no need to extend the #DEFVAR section. For me, both solutions are fine, either PROD or explicit products.

Great, I've gone with PROD for simplicity's sake.

Thanks, I think we're on the safe side when these numbers are expressed as REAL, not INTEGER.

I've made this change. The regex is a bit hairy but from inspection of the output mechanism I think I've caught all the edge cases, let me know if there's something I've missed.

You have already included the complex rate constants into the KPP file (complex_rates_out in kpp.rb). For KPP, it would be necessary to have the photolysis rate constants in a similar way.

Ah yes I'll look into adding this.

BTW: I was surprised to see that the facsimile output doesn't have any definitions of the photolysis rate constants either. How does facsimile get these values?

The primary software that we target with the facsimile files is AtChem2, which comes with the MCM photolysis rates hardcoded.

Thanks for all your comments, here's the current isoprene KPP

yantosca commented 9 months ago

Hi @stulacy and @RolfSander. I am traveling so I may be slow to look over the PR.

One note, using E exponents will force REAL4 precision. Better to use e.g. 250.0_dp, where dp is the KPP double precision kind parameter.

RolfSander commented 9 months ago

The simplest solution would be to put PROD into equations <2> and

<19>. Then there would be no need to extend the #DEFVAR section. For me, both solutions are fine, either PROD or explicit products.

Great, I've gone with PROD for simplicity's sake.

Thanks!

Unfortunately, in contrast to what I've written before, we probably still need the declaration of H2O in the #DEFVAR section. That's because we have ind_H2O in some rate constants. KPP only defines ind_H2O if there is a chemical species called H2O. I still have to think about the best way to implement this.

Thanks, I think we're on the safe side when these numbers are expressed as REAL, not INTEGER.

I've made this change. The regex is a bit hairy but from inspection of the output mechanism I think I've caught all the edge cases, let me know if there's something I've missed.

Yes, it's not that easy to find a regexp for this. Reverting unwanted changes with additional regexps is probably the best solution here.

I've checked your latest isoprene file and everything looks good. Thanks!

You have already included the complex rate constants into the KPP file (complex_rates_out in kpp.rb). For KPP, it would be necessary to have the photolysis rate constants in a similar way.

Ah yes I'll look into adding this.

It should be sufficient if J is declared in the F90_GLOBAL section:

out += "  REAL(dp), DIMENSION(61) :: J\n"

It is then the responsibility of the end user to fill the J array with meaningful photolysis rate constants, either those from the MCM or calculated in another way.

I've used the hardcoded number 61 for the array size here. You can probably replace that by some variable.

Although there are still a couple of open qustions, I think that the KPP code is ready now for beta-testing by the community. Once you have activated the KPP feature on the MCM web page, I will ask my colleagues for feedback. When Bob is back in the office, I will discuss with him the remaining points. I guess in a month or two we'll have some additional comments.

I think the big challenge is that the code works for everyone. Bob and I are using different models (GEOS-Chem and MECCA, respectively). Others are using KPP within several other models (https://en.wikipedia.org/wiki/Kinetic_PreProcessor). I always have to be careful that I don't suggest anything that would work for my model but not for others...

stulacy commented 9 months ago

I've added the H2O declaration. Is there anything else outstanding apart from the photolysis rates before it can be released do you think?

RolfSander commented 9 months ago

I've added the H2O declaration.

Thanks!

Is there anything else outstanding apart from the photolysis rates before it can be released do you think?

Yes, one more thing. While we are still looking for the ultimate solution, I think we can place the declarations of the complex rate coefficients into F90_GLOBAL for now. This allows using the MCM-generated KPP file without any further modifications. The complete F90_GLOBAL should look like this:

#INLINE F90_GLOBAL
  REAL(dp) :: &
    KRO2NO, KRO2HO2, KAPHO2, KAPNO, KRO2NO3, KNO3AL, &
    KDEC, KROPRIM, KROSEC, KCH3O2, K298CH3O2, K14ISOM1, &
    KD0, KDI, KRD, FCD, NCD, FD, KBPAN, KC0, KCI, KRC, &
    FCC, NC, FC, KFPAN, K10, K1I, KR1, FC1, NC1, F1, &
    KMT01, K20, K2I, KR2, FC2, NC2, F2, KMT02, K30, &
    K3I, KR3, FC3, NC3, F3, KMT03, K40, K4I, KR4, FC4, &
    NC4, F4, KMT04, KMT05, KMT06, K70, K7I, KR7, FC7, &
    NC7, F7, KMT07, K80, K8I, KR8, FC8, NC8, F8, KMT08, &
    K90, K9I, KR9, FC9, NC9, F9, KMT09, K100, K10I, &
    KR10, FC10, NC10, F10, KMT10, K1, K3, K4, K2, &
    KMT11, K120, K12I, KR12, FC12, NC12, F12, KMT12, &
    K130, K13I, KR13, FC13, NC13, F13, KMT13, K140, &
    K14I, KR14, FC14, NC14, F14, KMT14, K150, K15I, &
    KR15, FC15, NC15, F15, KMT15, K160, K16I, KR16, &
    FC16, NC16, F16, KMT16, K170, K17I, KR17, FC17, &
    NC17, F17, KMT17, KMT18, KPPN0, KPPNI, KRPPN, &
    FCPPN, NCPPN, FPPN, KBPPN
  REAL(dp), DIMENSION(61) :: J
  REAL(dp)::M, N2, O2, RO2, H2O
#ENDINLINE {above lines go into MODULE KPP_ROOT_Global}

This static list of variables is hopefully okay for now. In the long run, you may prefer to generate the list dynamically, e.g., when new complex rate coefficients are added to the MCM.

stulacy commented 9 months ago

I've added the complex rate and photolysis declarations, and I've also added the photolysis rates so users will just need to input a zenith angle. If you can have a look over/test the current KPP export and let me know if it looks ok, then I'll make this live on Monday to get some user feedback. Thanks!

RolfSander commented 9 months ago

Thanks for implementing the complex and photolysis rate constants! Unfortunately, there are still two problems :-(

1) Although the currrent MCM contains only 31 photolysis rate constants (J-values), a fortran array of size 31 is not sufficient. That's because there are gaps in the numbering of the J-values. The largest index is 56, so we probably need:

   REAL(dp), DIMENSION(56) :: J

2) The other point is something that I should have noticed earlier (but I didn't, sorry). Even though H2O is now properly declared as a species in the #DEFVAR section, KPP ignores it because it does not occur in any equation. The quick solution would be to reactivate H2O as a product in equation <19> again:

   HO2 + OH = H2O + O2

I will have to think about a better way how to deal with H2O and c(ind_H2O) in the humidity-dependent rate constants...

stulacy commented 9 months ago

Hi Rolf, I've made those changes. Latest isoprene mechanism here

Rather than having unused indices in the photolysis array, I've mapped the J rates onto their index but left a comment with a reference to its original J value in the MCM. I.e. for J=56, this now maps onto the 31st index: J(31) = 4.365E-05*(cos(zenith)**1.089)*exp(-0.323*(1./cos(zenith))) {MCM J=56.}

And when it's referenced in a reaction it's correctly pointing to the remapped J. <155> NO3CH2CHO + hv = HCOCH2O + NO2 : J(31)*4.3 ;

Does this look good to you?

I've also reverted the PRODS change so that H2O is now being used in a reaction. There is an edge case where this fails however. If a user doesn't export the inorganic reactions then eqn 19 won't be included and thus H2O won't be used. It would be good to have a more general solution, but this should do the job for now.

Let me know if you're happy for me to release this on Monday.

RolfSander commented 9 months ago

Hello Stuart,

Great, thanks! The current version looks fine, I think it's ready to be released on the MCM web page.

Rather than having unused indices in the photolysis array, I've mapped the J rates onto their index but left a comment with a reference to its original J value in the MCM. I.e. for J=56, this now maps onto the 31st index

Thanks, the mapping is a good idea, it will avoid wasting memory in unused array elements!

The comments with the references to the original MCM indices are also very good and important. Later (not now) I will try to find a good solution how those users who define their own J values can also apply the mapping.

A question about future versions of the MCM: Do you think the developers will fill the gaps in the J array? Or will they add new photolyses at the end? This could be important when a user decides to upgrade their model to a new MCM version.

I've also reverted the PRODS change so that H2O is now being used in a reaction.

Thanks!

There is an edge case where this fails however. If a user doesn't export the inorganic reactions then eqn 19 won't be included and thus H2O won't be used. It would be good to have a more general solution, but this should do the job for now.

Good point! Yes, that's also something where we have to find a more generic solution (not now though).

stulacy commented 9 months ago

A question about future versions of the MCM: Do you think the developers will fill the gaps in the J array? Or will they add new photolyses at the end? This could be important when a user decides to upgrade their model to a new MCM version.

I imagine that new photolyses will be added to the end, rather than risk breaking backwards compatibility by overwriting old rates (I'm not sure of the reason why there are gaps).

stulacy commented 9 months ago

I'm about to make the KPP export functionality live, we'll continue discussion of the outstanding issues in #177