Closed cpgr closed 4 years ago
You beaut!
I'll have to add to the documentation that I haven't read yet, but I'll just outline the working here.
Given a database (GWB or EQ3/6 format), the python converter works something like:
database_converter.py -i inputdb_in_gwb_format --format gwb -o moosedb.json
or
database_converter.py -i inputdb_in_eq36_format --format eq36 -o moosedb.json
There are some python functions in dbutils.py that can be used to query the database once it is in JSON format, like find all of the equilibrium species containing a given basis species, etc, things that are difficult to do by hand. I'll add something about these in the docs.
Job Documentation on 6045036 wanted to post the following:
View the site here
This comment will be updated on new commits.
I can't figure out the python testing importing. I have the something like the following structure
python/
|-- database_converter.py
|-- __init__.py (empty)
|-- readers/
| |-- gwb_reader.py
| |-- __init__.py (empty)
|-- tests/
|-- test_gwbreader.py
In python/tests/test_gwbreader.py
I need to import python/readers/gwb_reader.py
but I can't get it to work properly using
from readers import gwb_reader
from .readers import gwb_reader
import readers
etc.
I thought the __init__.py
's should have made the top import statement work but it doesn't.
Any suggestions appreciated.
I'm making changes to your code - will issue a PR at some stage. One question - why have a ThermoDB when we populate it and then immediately just create a database dictionary from it? Why not create the database dictionary directly?
Your import problem might be solved by my PR
It makes it easier to subsequently change the output if required - we only need to change the dict in one file instead of all the readers
OK, i understand what you mean
After finishing last night (i issued a PR to your branch - hope i did that correctly) i realised that the current json format may not be optimal for the redox couples. Bethke specifies that the reaction for a redox species may include redox species that have already appeared in the file. So he's imagining we read the file in order. Most of the redox species are in equilibrium, and you therefore completely eliminate them from consideration by using their reaction. If we know the file order, that'll be easy, but the current python dict won't retain that order.
I'm confused by Bethke's format. For instance
* i think this is a comment because it begins with an asterisk
but
* temperatures
appears to be a keyword indicating "temperatures are coming next". What is your interpretation, @cpgr ?
I'm confused by Bethke's format. For instance
* i think this is a comment because it begins with an asterisk
but
* temperatures
appears to be a keyword indicating "temperatures are coming next". What is your interpretation, @cpgr ?
Yeah, it appears to be a comment sometimes but a keyword others.
The redox couples in the gwb database are in alphabetical order, so there can't be an order dependency? Anyway, I think requiring an order in the database is a recipe for disaster and a probable source of hard to track down erroneous behaviour.
OK about the redox couples and alphabetical order. i bet it's just one of those vaguely defined things like the asterisk/comment stuff. Let's assume the redox species are defined in terms of the basis species alone.
Why don't we use the getter/setter methods in ThermoDB ?
Why don't we use the getter/setter methods in ThermoDB ?
We do. Where we have
db.prop
uses the getter, and where we have
db.prop(value)
uses the setter.
What about
db = ThermoDB()
db.format = 'gwb'
I'm a bit unsure of error checking. You can see i've put in some error checking in gwb_reader , mainly to check for units, but also a bit of formatting checking too. We could do much more error checking here, but i'm thinking that people will mainly use the standard database that is error free.
We could also error-check the json stuff in the C++ code, because someone might have mucked around with it. Should we do that?
I'm going to add the sorption stuff to GeochemicalDatabaseReader. OK?
What about
db = ThermoDB() db.format = 'gwb'
Do you mean what does this do? The first line creates a ThermoDB object that gets initialised with everything empty. The second line sets the format
property to gwb
.
Later on, db.format
is used to fill the JSON bit under Header:oringial format
The error checking is hard, as the databases are just ascii files that anyone can fiddle with. It's hard to make it robust to all possible changes.
I imagine that no-one except you or I will use this python utility and just run with the default file provided, or if they do it is a once off and they will have to be careful.
In my initial conceptualisation, i thought the reader could do a lot of "elimination" stuff: see https://mooseframework.inl.gov/docs/PRs/14910/site/modules/geochemistry/geochemistry_database_reader.html . But you have written the reader as just a pure reader, so i'm guessing you thought we should write another class to do the elimination, yea?
What about
db = ThermoDB() db.format = 'gwb'
Do you mean what does this do? The first line creates a ThermoDB object that gets initialised with everything empty. The second line sets the
format
property togwb
.Later on,
db.format
is used to fill the JSON bit underHeader:oringial format
I meant, i thought the lines would be something like
db = ThermoDB()
db.set_format('gwb')
and later
database['Header']['original format'] = db.get_format()
What about
db = ThermoDB() db.format = 'gwb'
Do you mean what does this do? The first line creates a ThermoDB object that gets initialised with everything empty. The second line sets the
format
property togwb
. Later on,db.format
is used to fill the JSON bit underHeader:oringial format
I meant, i thought the lines would be something like
db = ThermoDB() db.set_format('gwb')
and later
database['Header']['original format'] = db.get_format()
The @property
decorate means you don't need to do it that way, eg https://docs.python.org/2/library/functions.html#property
I think it is heaps cleaner this way
Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?
In my initial conceptualisation, i thought the reader could do a lot of "elimination" stuff: see https://mooseframework.inl.gov/docs/PRs/14910/site/modules/geochemistry/geochemistry_database_reader.html . But you have written the reader as just a pure reader, so i'm guessing you thought we should write another class to do the elimination, yea?
On the contrary, the C++ utility only extracts the species provided upfront (in the input file eventually).
I get it, thanks Chris. I didn't know about that until now.
In my initial conceptualisation, i thought the reader could do a lot of "elimination" stuff: see https://mooseframework.inl.gov/docs/PRs/14910/site/modules/geochemistry/geochemistry_database_reader.html . But you have written the reader as just a pure reader, so i'm guessing you thought we should write another class to do the elimination, yea?
On the contrary, the C++ utility only extracts the species provided upfront (in the input file eventually).
OK, i haven't read it. There might be difficulties with redox, ignored minerals, and so on.
Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?
We could do check that logk.size() =
temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.
Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?
We could do check that
logk.size() =
temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.
Yea, we can do that. But there are so many other things that could go wrong too. It'd be cool if there were some database format that made errors simpler to find for users who are mucking around with plain text.
Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?
We could do check that
logk.size() =
temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.Yea, we can do that. But there are so many other things that could go wrong too. It'd be cool if there were some database format that made errors simpler to find for users who are mucking around with plain text.
I don't think we should worry too much about it. Just put a great big caveat emptor in the documentation that fiddling with the database manually could lead to problems. All of the reactive transport codes would have this issue. TOUGHREACT uses a plain text format that I'm sure you could break by adding an extra number in a line.
I don't get the ignored minerals bit? You should have to specify the possible minerals that could arise in the input file (like pflotran or tough react) and ignore all others, shouldn't you?
Errors: OK. I think one way
Error checking: i agree it's super hard, and i agree that it's unlikely anyone but us will run the converter. However, i wouldn't be surprised if people mucked around with the json file. Is there a way of enforcing, prior to parsing using the C++, things like "you must have 8 entries in your equilibrium constant if there are 8 temperatures", etc? Perhaps json has this capability?
We could do check that
logk.size() =
temp.size()` in the GeochemicalDatabaseReader as we read the equilibrium constant data for the species easily enough.Yea, we can do that. But there are so many other things that could go wrong too. It'd be cool if there were some database format that made errors simpler to find for users who are mucking around with plain text.
I don't think we should worry too much about it. Just put a great big caveat emptor in the documentation that fiddling with the database manually could lead to problems. All of the reactive transport codes would have this issue. TOUGHREACT uses a plain text format that I'm sure you could break by adding an extra number in a line.
OK, i think a way around this is to create a nice "dump" or "print" routine that allows the user to inspect exactly what the database looks like to MOOSE. Probably like your printReactions method, but more stuff.
Do you want me to do that?
Minerals: yes, we could make the user specify exactly what minerals they desire. Is this what you were imagining? Similarly, we demand the user specify exactly what fixed-fugacity gases they desire.
However, we include: all relevant secondary species and all relevant surface species.
This requires error checking. For instance, a user could specify that mineral Fe(OH)3(ppd) is to be included, without specifying that Fe++ should be in the basis species.
Is this what you'd like the Reader to look like?
OK, i think a way around this is to create a nice "dump" or "print" routine that allows the user to inspect exactly what the database looks like to MOOSE. Probably like your printReactions method, but more stuff.
Do you want me to do that?
I think that it will be easy to validate the database with our given schema upfront. I'll make a class that does this, so don't bother error checking anything.
I'll add the print routines as well if you like
Re: the minerals. There are things like rate constant, activation energy etc that aren't in databases that you need to specify for the mineral reactions, so they must be included by the user. I don't think there is any way around that, is there?
And the secondary species, yeah, there is no consistency check yet. We do need to make sure that all of the required basis species are included.
Re: the minerals. There are things like rate constant, activation energy etc that aren't in databases that you need to specify for the mineral reactions, so they must be included by the user. I don't think there is any way around that, is there?
And the secondary species, yeah, there is no consistency check yet. We do need to make sure that all of the required basis species are included.
Minerals: there can be minerals in equilibrium with the aqueous solution, and they are rather like secondary species, except that they can completely dissolve, so the code has to be careful there, or they can supersaturate but the user doesn't want them to precipitate. Then there are other minerals that are governed by rate laws. At the Reader stage, we want the reader to include all of these. After Reading, the user can then specify which are in equilibrium, which can supersaturated, which are kinetic (and give the appropriate law).
Hi @cpgr, i just realised you already have methods for getting the names of all the secondary species, etc.
Hi @cpgr, i just realised you already have methods for getting the names of all the secondary species, etc.
Oh, I forgot they were there!
Thanks for rebasing against next, @cpgr. It's my 4th libmesh build of the day!
TODO: python should check that all "sorbing minerals" are also present in the "minerals" section of the database.
Job Python Tests (Conda) on df2ce41 : invalidated by @WilkAndy
strange failures in things that appear to be unrelated to this PR (eg MooseDocs/test.materialize/serial)
What's up with these Conda errors, @cpgr?
It's not us is it?
@permcody and colleagues... this PR is getting bigger and bigger, but @cpgr and i will stop adding much to it for a while. We don't understand why the python tests fail.
Python testing failures are not your fault. @milljm and @aeslaughter are working on it.
Job Python 3.6 Tests (Conda) on 6134a42 : invalidated by @WilkAndy
trying again....
This is ready for inspection. @lindsayad , do you have time?
Could we potentially squash some of the "fix" commits? By the way if you guys aren't aware already git commit
takes a --fixup
flag. I use it all the time. You can look here for some more detail
Thanks for the info about fixup
. @cpgr, can you squash?
Thermodynamic database reader capability.
Two parts: 1) Some python utilities to convert GWB or EQ3/6 databases to JSON for easy reading. This includes some utilities that can be used to query the database in python that may be useful.
2) The database reader utility that finds all of the species specified by the user and returns all of the associated information for use in whatever classes.
fyi @WilkAndy
Note: the python unit tests aren't working properly as I haven't figured out how to fix the module importing part yet.
Refs #14897