Refactoring of Mixture class

quinnreynolds commented 7 years ago

Original report by Quinn Reynolds (Bitbucket: kittychunk, GitHub: kittychunk).

Per suggestions by @alchemyst, the use of the Mixture class as a grab-bag of various loosely associated pieces is poor OO and should be revisited. In particular:

The comments made in #25 re: the (ab)use of JSON files, and changing to a class-focused rather than file-focused code are germane to Mixture also.
Exactly what a Mixture is (and isn't) should be more clearly defined in terms of how it is used by both the code and the end user. The use-case document I'm preparing will hopefully help us to understand this better. Relatedly...
...Mixture probably works better as a relatively simple data structure. It can retain some of the preparation code that's currently in Mixture.__init__, but most of the methods (composition solver, property calculators etc) could be moved out to top-level package functions which take a Mixture of Species as an argument.
It's not entirely clear to me yet whether the Mixture class should also contain composition information (species numbers & number densities), or whether this should just be something that is returned by the composition solver function and then passed as an argument to the property calculators. I'm leaning toward the latter but let's look at the use-case first.

As with #25 this change will require a not-insignificant refactoring of the code, and I suggest we prioritise it after we have all had a chance to go through the typical use case document that I'm busy preparing.

quinnreynolds commented 4 years ago

As of 5735cde9bb27313dc41290959d4226856ec9c425 I've made some basic changes to get away from the file-centric design, but I'm having trouble seeing a good way forward with this in terms of a broader refactor reflecting best practice and usability. There are a number of obstacles/opportunities.

In the current implementation, the (user-relevant) attributes of a Mixture object fall in one of two categories: a "spec" (list of species, initial comp x0, temperature T, pressure P), and a "state" (composition ni). The intention is for these things to consistently represent an LTE condition at all times - that is, the ni are a function of the spec, which is what you get after calling the solve_gfe method. However, it is dangerously easy and possibly even obvious to change any of these attributes manually to any value you like at any time. As soon as this happens the object is no longer in the "correct state" (namely at LTE), and it would be easy for the user to forget that they have to call solve_gfe afterward to return it to such.
The property calculation methods depend (in most cases) only on the current object state. They will return a value regardless of whether the object is at LTE or not. This may be desirable behaviour in that experienced users might want to simply calculate properties based on plasma compositions they obtain elsewhere, but for the most part users will be expecting these methods to return LTE values for the object they have created. If they tinker with any of the spec or state attributes (see point 1) this would not be the case.
In one particular case, namely calculate_heat_capacity, the method by its nature has to force a recalculation of the LTE composition by calling solve_gfe internally. This breaks the paradigm of "general property calculations based on arbitrary state and spec" which is used for the other property calculation functions (see point 2). As it currently stands, it also leaves the Mixture object in a non-LTE configuration upon exit which is problematic.

There are some possible resolutions, but none of them are a silver bullet.

An obvious solution would seem to be to force all property calculation methods to run solve_gfe whenever they are called, regardless of whether it's actually required for their particular calculation. This would guarantee that the answer returned would always represent LTE for the current spec and state, but it has two problems: (a) it reduces functionality (methods will only calculate LTE property values, not general ones), and (b) it can be computationally inefficient - typical use of the module has users calculating many properties at once for a particular Mixture, and simple re-use of the ni between those calculations rather than re-running the free energy minimiser each time would be much faster in such cases.
We could do solution 1 above, but implement a flag or check which indicates whether the object is currently in an LTE state or not, and only redo the solve_gfe calculation if it isn't. This would need to be handled either by building getter and setter methods for every attribute in Mixture, or by performing some low-overhead calculation that indicates whether an object is currently at LTE or not. Problems: the former is hard to implement tidily when attributes are other objects like arrays and lists, and the latter is mathematically non-trivial and subject to arbitrary numerical limits (how close to LTE is "close enough" etc) which could cause unpredictable behaviour.
We could remove the ni attribute from the class completely, and just have it be consumed and returned as a function argument by the class methods instead. This would retain the broad functionality of the property calculation methods, and remove the temptation to put the object "out of state" by writing values to the ni attribute. Problems: (a) the need to supply ni as an argument for every method doesn't really solve the problem since users can still put garbage into it, especially if it is not completely clear to them that it is supposed to be an LTE composition obtained from solve_gfe, (b) it makes the function calls cluttered since an additional parameter variable must be created and passed to them, (c) it could result in overly verbose code that looks like mymixture.calculate_property(mymixture.solve_gfe()), (d) calculate_heat_capacity's call template would look different to all the others.
We could reduce Mixture to a simple dataclass or even drop it completely and use a standard Python data structure (list or dict), and move all methods out to module level. If ni is retained as a member of Mixture we'd need a secondary consistency-enforcing solution as per 1 or 2 above, and if not it would suffer from some of the same problems as 3 with the added ugliness of having to pass a Mixture object argument into the property function calls, eg: minplascalc.calculate_property(mymixture, minplascalc.solve_gfe(mymixture)).
Keep ni in the class, but allow it to be overridden with defaults on the method arguments. The default behaviour could be to use the current values stored in the ni attribute, while allowing experienced users to provide different values explicitly. This would remove any need to touch the ni class attribute directly. Problems: doesn't really address the "out of state" problem as users could still change ni intentionally or accidentally, and method function descriptions will start being confusing to users ("Why is it asking for a list of ni values when there is already one stored in the object? Why does the ni argument default to None?" etc).
Do any of the above, plus drop the calculate_heat_capacity method entirely; it's very much the odd one out here and the source of much of the confusion. This would mean that users would be left to do their own Cp calculations by numerical derivative of the enthalpy, but that's (perhaps?) a minor inconvenience in return for having consistent calling conventions for all property methods in the module. This feels cleaner.

In the spirit of KISS, I'm leaning toward a combination of solutions 4 and 6 above. A "mixture" is really nothing more than a list of Species objects, and things that act on a mixture should perhaps be top-level functions that return results rather than modifying attributes of a Mixture object in-place in potentially inconsistent ways.

I'm getting decision paralysis trying to figure out the best approach up front. For now I may pop out a couple of branches from issue27 to try different approaches and see how they feel in practice.

quinnreynolds commented 4 years ago

An implementation of solution 1 above is on branch issue27_test1 (8606feb7daeb4bc7dbc6235aea52c7912bc48a3c). It's actually fairly clean in terms of the code and user interface, but of course it can chew up resources unnecessarily in certain cases as indicated.

quinnreynolds commented 4 years ago

An implementation of solution 4 above is on branch issue27_test2 (973455857a7f18bda6756cd59061287ca75c4e24). It's much less usable and more error-prone than I thought it would be; turns out there are a number of internal variables used in Mixture that the various property functions depend on, which makes it hard to create anything like a consistent calling convention for all of them.

I thought this would be the tidiest solution but the worked example has proved me wrong. Branch issue27_test1 looks more promising so will continue development there for now.

quinnreynolds commented 4 years ago

It turns out there is a robust and (relatively) low-overhead way of testing if a Mixture is currently in a converged LTE state, by doing a single iteration of the minimiser solver and checking the relative tolerance. This makes solution 2 above feasible.

Implemented in eeda3de7686d154c4b0fe58826aa14a71b90ebb7.

alchemyst commented 4 years ago

Is there a reason why we're not just marking the mixture as unconverged with a flag, then setting that flag when the user changes properties?

alchemyst commented 4 years ago

Wait, I see that is option 2. The main issue is therefore the mutability of the properties? Perhaps we can fix that by using immutable versions of those properties?

alchemyst commented 4 years ago

There's another option - perhaps the things that affect the optimality could be bundled into another object like "State" whose job it is to maintain the thermodynamic state of mixtures.

quinnreynolds commented 4 years ago

Pride cometh before a fall as always; I've since found an edge case (one of the notebooks with more complex mixtures) which fails my current convergence-check test. So back to the drawing board on that one.

quinnreynolds commented 4 years ago

Thanks @alchemyst - I agree that some sort of flagging when the user changes object attributes would be much more desirable, however, the problem is that some attributes (Mixture.species and Mixture.x0) are themselves objects; my Python-fu isn't strong enough to figure out how to make those change a state flag when things inside them are changed.

Example 1. This I know how to handle with property getters and setters to change a flag: mymixture.x0 = my_array

Example 2. This I have no idea how to flag as a change: mymixture.x0[i] = my_value_for_i

quinnreynolds commented 4 years ago

@alchemyst - again my fu fails me. How does one make an attribute immutable (especially if that attribute is itself an instance of another object like a list or array, whose elements could be accessed directly by a suitably persistent user)?

I'm also not sure this completely solves the problem. You do actually want the user to be able to change attributes (T, P, x0 particularly) in the course of their investigations. We could get around that by making the attributes completely immutable except for getter and setter methods (which can then flag changes away from LTE etc), but I'm not sure how Pythonic that would be.

alchemyst commented 4 years ago

The scalar attributes are easy to handle, since floats and ints and so on are immutable in Python. The problem as you correctly mention is if the attributes are mutable. The solution is to use immutable versions of those things. So instead of a list use a tuple, or if the attribute is an array set array.flags.writeable = False.

If a user then wants to experiment with different versions they can do something like

m = Mixture(...)
m.x0 = [0.1, 0.9]  # this converts to a tuple or non-writable array and sets the dirty flag
m.x0[0] = 0.2  # exception - immutable object
m.x0 = [0.2, 0.8] # this is OK
# more changes
m.method_which_requires_recalc() # causes exception _or_ transparently recalcs

alchemyst commented 4 years ago

If it's important to you to allow users to change for instance x0 in place, we could create an object which notifies its parent of calls to __setitem__, but that seems a bit of a hack.

alchemyst commented 4 years ago

We could also stop write access to all of these attributes completely and just have a change_state() method which handles all state changes and has default None arguments for all the ones you don't want to change.

quinnreynolds commented 4 years ago

Thank you Carl; this is why you are the Chief Python Officer ;). Your first option above is pretty much exactly what I was after, I'll try an implementation first thing next week. Tuples are probably the easiest way, will start with that approach.

Of the four attributes which define what values will be obtained from the LTE composition and physical property calculations - species list, x0 list, T, and P - the temptation/need for users to change things inside a particular object instance is probably in the following order: T & P > x0 > species. In particular:

It is reasonable to assume that it is obvious to the target user that a mixture would be defined in the main part by the plasma species present; a different list of species logically implies that it is a different mixture. This implies that one would normally create a new mixture object rather than change the list of species in an existing one, except out of sheer bloodymindedness. Some mild restrictions on doing that sort of thing by accident will probably be enough.
It is uncommon (but definitely not impossible) that the user might want to change the "base spec" for the plasma mixture defined by the x0 values. Allowing them to specify a completely new array/list for x0 while restricting their access to the elements per your solution above, and handling LTE state with a flag variable, should be more than adequate to prevent misuse.
It is going to be very common for the user to want to change P and (especially) T for a mixture. As scalars they can however be handled easily and transparently, and combined with flag setting as you indicate.

quinnreynolds commented 4 years ago

Implemented in 4c505dabb36d9bb7892f152a1daaa1a0acc12663. Nice and tidy, raises exceptions in all the typical misuse cases.

A dogged user is still able to mess things up by changing attributes inside the species objects themselves (eg mymixture.species[i].ionisationenergy = some_value) or assigning double-underscored variables directly by their mangled names, but these would be pretty malignant events and are highly unlikely to happen by accident.

alchemyst commented 4 years ago

Beautiful. Now we just need some tests which ensure these properties stay immutable (so check that attempts to change them raise errors) and we're good to go!

quinnreynolds commented 4 years ago

Draft changes complete and pulled to branch cleanup. See e3ecdf1bc45a789fcbffdb9a6fd8525519b3317c.

quinnreynolds / minplascalc

Refactoring of Mixture class #27