Geochemistry database reader

cpgr commented 4 years ago

Thermodynamic database reader capability.

Two parts: 1) Some python utilities to convert GWB or EQ3/6 databases to JSON for easy reading. This includes some utilities that can be used to query the database in python that may be useful.

2) The database reader utility that finds all of the species specified by the user and returns all of the associated information for use in whatever classes.

fyi @WilkAndy

Note: the python unit tests aren't working properly as I haven't figured out how to fix the module importing part yet.

Refs #14897

WilkAndy commented 4 years ago

You beaut!

cpgr commented 4 years ago

I'll have to add to the documentation that I haven't read yet, but I'll just outline the working here.

Given a database (GWB or EQ3/6 format), the python converter works something like:

database_converter.py -i inputdb_in_gwb_format --format gwb -o moosedb.json

or

database_converter.py -i inputdb_in_eq36_format --format eq36 -o moosedb.json

There are some python functions in dbutils.py that can be used to query the database once it is in JSON format, like find all of the equilibrium species containing a given basis species, etc, things that are difficult to do by hand. I'll add something about these in the docs.

moosebuild commented 4 years ago

Job Documentation on 6045036 wanted to post the following:

View the site here

This comment will be updated on new commits.

cpgr commented 4 years ago

I can't figure out the python testing importing. I have the something like the following structure

python/
|-- database_converter.py
|-- __init__.py (empty)
|-- readers/
|    |-- gwb_reader.py
|    |-- __init__.py (empty)
|-- tests/
     |-- test_gwbreader.py

In python/tests/test_gwbreader.py I need to import python/readers/gwb_reader.py but I can't get it to work properly using

from readers import gwb_reader
from .readers import gwb_reader
import readers

etc.

I thought the __init__.py's should have made the top import statement work but it doesn't. Any suggestions appreciated.

WilkAndy commented 4 years ago

I'm making changes to your code - will issue a PR at some stage. One question - why have a ThermoDB when we populate it and then immediately just create a database dictionary from it? Why not create the database dictionary directly?

WilkAndy commented 4 years ago

Your import problem might be solved by my PR

cpgr commented 4 years ago

It makes it easier to subsequently change the output if required - we only need to change the dict in one file instead of all the readers

WilkAndy commented 4 years ago

OK, i understand what you mean

WilkAndy commented 4 years ago

After finishing last night (i issued a PR to your branch - hope i did that correctly) i realised that the current json format may not be optimal for the redox couples. Bethke specifies that the reaction for a redox species may include redox species that have already appeared in the file. So he's imagining we read the file in order. Most of the redox species are in equilibrium, and you therefore completely eliminate them from consideration by using their reaction. If we know the file order, that'll be easy, but the current python dict won't retain that order.

WilkAndy commented 4 years ago

I'm confused by Bethke's format. For instance

* i think this is a comment because it begins with an asterisk

but

* temperatures

appears to be a keyword indicating "temperatures are coming next". What is your interpretation, @cpgr ?

cpgr commented 4 years ago

I'm confused by Bethke's format. For instance
* i think this is a comment because it begins with an asterisk
but
* temperatures
appears to be a keyword indicating "temperatures are coming next". What is your interpretation, @cpgr ?

Yeah, it appears to be a comment sometimes but a keyword others.

cpgr commented 4 years ago

The redox couples in the gwb database are in alphabetical order, so there can't be an order dependency? Anyway, I think requiring an order in the database is a recipe for disaster and a probable source of hard to track down erroneous behaviour.

WilkAndy commented 4 years ago

OK about the redox couples and alphabetical order. i bet it's just one of those vaguely defined things like the asterisk/comment stuff. Let's assume the redox species are defined in terms of the basis species alone.

WilkAndy commented 4 years ago

Why don't we use the getter/setter methods in ThermoDB ?

cpgr commented 4 years ago

Why don't we use the getter/setter methods in ThermoDB ?

We do. Where we have

db.prop

uses the getter, and where we have

db.prop(value)

uses the setter.

WilkAndy commented 4 years ago

What about

    db = ThermoDB()
    db.format = 'gwb'

WilkAndy commented 4 years ago

I'm a bit unsure of error checking. You can see i've put in some error checking in gwb_reader , mainly to check for units, but also a bit of formatting checking too. We could do much more error checking here, but i'm thinking that people will mainly use the standard database that is error free.

We could also error-check the json stuff in the C++ code, because someone might have mucked around with it. Should we do that?

WilkAndy commented 4 years ago

I'm going to add the sorption stuff to GeochemicalDatabaseReader. OK?

cpgr commented 4 years ago

What about

    db = ThermoDB()
    db.format = 'gwb'

Do you mean what does this do? The first line creates a ThermoDB object that gets initialised with everything empty. The second line sets the format property to gwb.

Later on, db.format is used to fill the JSON bit under Header:oringial format

cpgr commented 4 years ago

The error checking is hard, as the databases are just ascii files that anyone can fiddle with. It's hard to make it robust to all possible changes.

I imagine that no-one except you or I will use this python utility and just run with the default file provided, or if they do it is a once off and they will have to be careful.

WilkAndy commented 4 years ago

In my initial conceptualisation, i thought the reader could do a lot of "elimination" stuff: see https://mooseframework.inl.gov/docs/PRs/14910/site/modules/geochemistry/geochemistry_database_reader.html . But you have written the reader as just a pure reader, so i'm guessing you thought we should write another class to do the elimination, yea?

WilkAndy commented 4 years ago

What about
    db = ThermoDB()
    db.format = 'gwb'
Do you mean what does this do? The first line creates a ThermoDB object that gets initialised with everything empty. The second line sets the format property to gwb.

Later on, db.format is used to fill the JSON bit under Header:oringial format

I meant, i thought the lines would be something like

db = ThermoDB()
db.set_format('gwb')

and later

database['Header']['original format'] = db.get_format()

cpgr commented 4 years ago

What about
    db = ThermoDB()
    db.format = 'gwb'
Do you mean what does this do? The first line creates a ThermoDB object that gets initialised with everything empty. The second line sets the format property to gwb. Later on, db.format is used to fill the JSON bit under Header:oringial format
I meant, i thought the lines would be something like
db = ThermoDB()
db.set_format('gwb')
and later
database['Header']['original format'] = db.get_format()

The @property decorate means you don't need to do it that way, eg https://docs.python.org/2/library/functions.html#property

I think it is heaps cleaner this way

WilkAndy commented 4 years ago

Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?

cpgr commented 4 years ago

In my initial conceptualisation, i thought the reader could do a lot of "elimination" stuff: see https://mooseframework.inl.gov/docs/PRs/14910/site/modules/geochemistry/geochemistry_database_reader.html . But you have written the reader as just a pure reader, so i'm guessing you thought we should write another class to do the elimination, yea?

On the contrary, the C++ utility only extracts the species provided upfront (in the input file eventually).

WilkAndy commented 4 years ago

https://docs.python.org/2/library/functions.html#property

I get it, thanks Chris. I didn't know about that until now.

WilkAndy commented 4 years ago

In my initial conceptualisation, i thought the reader could do a lot of "elimination" stuff: see https://mooseframework.inl.gov/docs/PRs/14910/site/modules/geochemistry/geochemistry_database_reader.html . But you have written the reader as just a pure reader, so i'm guessing you thought we should write another class to do the elimination, yea?

On the contrary, the C++ utility only extracts the species provided upfront (in the input file eventually).

OK, i haven't read it. There might be difficulties with redox, ignored minerals, and so on.

cpgr commented 4 years ago

Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?

We could do check that logk.size() =temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.

WilkAndy commented 4 years ago

Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?

We could do check that logk.size() =temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.

Yea, we can do that. But there are so many other things that could go wrong too. It'd be cool if there were some database format that made errors simpler to find for users who are mucking around with plain text.

cpgr commented 4 years ago

Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?

We could do check that logk.size() =temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.

Yea, we can do that. But there are so many other things that could go wrong too. It'd be cool if there were some database format that made errors simpler to find for users who are mucking around with plain text.

I don't think we should worry too much about it. Just put a great big caveat emptor in the documentation that fiddling with the database manually could lead to problems. All of the reactive transport codes would have this issue. TOUGHREACT uses a plain text format that I'm sure you could break by adding an extra number in a line.

cpgr commented 4 years ago

I don't get the ignored minerals bit? You should have to specify the possible minerals that could arise in the input file (like pflotran or tough react) and ignore all others, shouldn't you?

WilkAndy commented 4 years ago

Errors: OK. I think one way

Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?

We could do check that logk.size() =temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.

Yea, we can do that. But there are so many other things that could go wrong too. It'd be cool if there were some database format that made errors simpler to find for users who are mucking around with plain text.

I don't think we should worry too much about it. Just put a great big caveat emptor in the documentation that fiddling with the database manually could lead to problems. All of the reactive transport codes would have this issue. TOUGHREACT uses a plain text format that I'm sure you could break by adding an extra number in a line.

OK, i think a way around this is to create a nice "dump" or "print" routine that allows the user to inspect exactly what the database looks like to MOOSE. Probably like your printReactions method, but more stuff.

Do you want me to do that?

WilkAndy commented 4 years ago

Minerals: yes, we could make the user specify exactly what minerals they desire. Is this what you were imagining? Similarly, we demand the user specify exactly what fixed-fugacity gases they desire.

However, we include: all relevant secondary species and all relevant surface species.

This requires error checking. For instance, a user could specify that mineral Fe(OH)3(ppd) is to be included, without specifying that Fe++ should be in the basis species.

Is this what you'd like the Reader to look like?

cpgr commented 4 years ago

OK, i think a way around this is to create a nice "dump" or "print" routine that allows the user to inspect exactly what the database looks like to MOOSE. Probably like your printReactions method, but more stuff.

Do you want me to do that?

I think that it will be easy to validate the database with our given schema upfront. I'll make a class that does this, so don't bother error checking anything.

cpgr commented 4 years ago

I'll add the print routines as well if you like

cpgr commented 4 years ago

Re: the minerals. There are things like rate constant, activation energy etc that aren't in databases that you need to specify for the mineral reactions, so they must be included by the user. I don't think there is any way around that, is there?

And the secondary species, yeah, there is no consistency check yet. We do need to make sure that all of the required basis species are included.

WilkAndy commented 4 years ago

Re: the minerals. There are things like rate constant, activation energy etc that aren't in databases that you need to specify for the mineral reactions, so they must be included by the user. I don't think there is any way around that, is there?

And the secondary species, yeah, there is no consistency check yet. We do need to make sure that all of the required basis species are included.

Minerals: there can be minerals in equilibrium with the aqueous solution, and they are rather like secondary species, except that they can completely dissolve, so the code has to be careful there, or they can supersaturate but the user doesn't want them to precipitate. Then there are other minerals that are governed by rate laws. At the Reader stage, we want the reader to include all of these. After Reading, the user can then specify which are in equilibrium, which can supersaturated, which are kinetic (and give the appropriate law).

WilkAndy commented 4 years ago

Hi @cpgr, i just realised you already have methods for getting the names of all the secondary species, etc.

cpgr commented 4 years ago

Hi @cpgr, i just realised you already have methods for getting the names of all the secondary species, etc.

Oh, I forgot they were there!

WilkAndy commented 4 years ago

Thanks for rebasing against next, @cpgr. It's my 4th libmesh build of the day!

WilkAndy commented 4 years ago

TODO: python should check that all "sorbing minerals" are also present in the "minerals" section of the database.

moosebuild commented 4 years ago

Job Python Tests (Conda) on df2ce41 : invalidated by @WilkAndy

strange failures in things that appear to be unrelated to this PR (eg MooseDocs/test.materialize/serial)

WilkAndy commented 4 years ago

What's up with these Conda errors, @cpgr?

cpgr commented 4 years ago

It's not us is it?

WilkAndy commented 4 years ago

@permcody and colleagues... this PR is getting bigger and bigger, but @cpgr and i will stop adding much to it for a while. We don't understand why the python tests fail.

permcody commented 4 years ago

Python testing failures are not your fault. @milljm and @aeslaughter are working on it.

moosebuild commented 4 years ago

Job Python 3.6 Tests (Conda) on 6134a42 : invalidated by @WilkAndy

trying again....

WilkAndy commented 4 years ago

This is ready for inspection. @lindsayad , do you have time?

lindsayad commented 4 years ago

Could we potentially squash some of the "fix" commits? By the way if you guys aren't aware already git commit takes a --fixup flag. I use it all the time. You can look here for some more detail

WilkAndy commented 4 years ago

Thanks for the info about fixup. @cpgr, can you squash?

idaholab / moose

Geochemistry database reader #14911