Open jngrad opened 2 months ago
Online discussion with @pm-blanco: move the unit registry creation to the class constructor, remove line 2528, and introduce units._build_cache()
in set_reduced_units()
to rebuild the cache when reduced units are redefined.
Global variables are notorious for introducing race conditions, which can cause bugs that are hard to reproduce and investigate. They also have shared ownership with multiple objects, thus client code cannot make assumptions about their state.
To illustrate pitfalls from global variables, consider the following example:
This is a simplified version of the pyMBE class that was available up to and including d10d17236217f71c2c13d8c7816e065c7d9e21e2. The class members
units
andSEED
are bound to the class, not to class instances. You can access them from outside withprint(pymbe_library.SEED)
.Here is a MWE:
Output:
The
SEED
members are different, but theunit
members are shared between the two objects (same memory address). These members differ in behavior because integers are immutable, while Pint objects are mutable (quick refresher). The rules for an object's mutability and scope can be a bit difficult to grasp, and while Python developers should know about them, we cannot expect them to know how the pyMBE class constructor uses these rules internally to manage the lifetime and ownership of its class members. I would advocate for the principle of least surprise by making these class members fully owned by the class instance, i.e. the constructed object, rather than by the class itself.This can be achieved without any API change with the following rewrite:
Output of the MWE shows that each object has its own Pint unit registry:
The same issue applies to the data frame. What happens if we create two class instances to load different parameter sets? The answer is not straightforward, because pandas sometimes returns copies, sometimes returns views, and sometimes allows data to be modified in-place.
Coming back to the unit system, we have another source of counter-intuitive behavior in
pymbe_library.set_reduced_units()
:https://github.com/pyMBE-dev/pyMBE/blob/37491e8c9625d90e4752bcc9705abcf7c3a9a990/pyMBE.py#L2511
Here is the relevant part of the method body:
https://github.com/pyMBE-dev/pyMBE/blob/37491e8c9625d90e4752bcc9705abcf7c3a9a990/pyMBE.py#L2528-L2544
We are creating a new unit registry. Which means we cannot pass a custom
temperature
as argument, because we end up multiplying two quantities whose unit belong to different registries:Output:
This exception is not necessarily raised by passing custom values to the other function arguments, but it might be raised later in the simulation script... It is simply not possible to safely pass custom values, because line 2528 guarantees that our custom values won't belong to the same registry as the one used by pyMBE.
Deleting line 2528 doesn't help, because when we call
units.define(f"reduced_length = {unit_length}")
, we end up re-defining the unit. I just found out the hard way how dangerous that is:Output:
The new
reduced_length
value is not used because the cache is out-of-date. See https://github.com/hgrecco/pint/issues/1773#issuecomment-1546765201 and the source code inhgrecco/pint@0.23:pint/facets/plain/registry.py#L517-L521
for more details. By default it should raise a warning, but in my environment (Python 3.10.12, Pint 0.23), the warnings don't get displayed in the terminal.It would probably be safer to let the user define their own Pint unit registry, and pass it as a
units
argument of the pyMBE class constructor. In this way, the correct set up of the unit registry becomes the responsibility of the user rather than ours, and we reduce the risk of making a mistake, like the one I made above when re-defining units, or the one currently in pyMBE where the temperature cannot be set, or the one that used to be in the pyMBE constructor where the energy unit was silently ignored (https://github.com/pyMBE-dev/pyMBE/pull/55#discussion_r1601951839). If the user doesn't provide aunits
, we define a default one.