pyMBE-dev / pyMBE

pyMBE provides tools to facilitate building up molecules with complex architectures in the Molecular Dynamics software ESPResSo. For an up-to-date API documention please check our website:
https://pymbe-dev.github.io/pyMBE/pyMBE.html
GNU General Public License v3.0
5 stars 7 forks source link

Remove global state #60

Open jngrad opened 2 months ago

jngrad commented 2 months ago

Global variables are notorious for introducing race conditions, which can cause bugs that are hard to reproduce and investigate. They also have shared ownership with multiple objects, thus client code cannot make assumptions about their state.

To illustrate pitfalls from global variables, consider the following example:

class pymbe_library():
    import pint
    units = pint.UnitRegistry()
    SEED = None
    def __init__(self, SEED):
        self.SEED = SEED
        unit_length = 0.355 * self.units.nm
        self.units.define(f"reduced_length = {unit_length}")

This is a simplified version of the pyMBE class that was available up to and including d10d17236217f71c2c13d8c7816e065c7d9e21e2. The class members units and SEED are bound to the class, not to class instances. You can access them from outside with print(pymbe_library.SEED).

Here is a MWE:

pmb1 = pymbe_library(SEED=42)
pmb2 = pymbe_library(SEED=43)
print(f"{pmb1.SEED=}")
print(f"{pmb2.SEED=}")
print(f"{pmb1.units=}")
print(f"{pmb2.units=}")
print(id(pmb1.units) == id(pmb2.units))

Output:

pmb1.SEED=42
pmb2.SEED=43
pmb1.units=<pint.registry.UnitRegistry object at 0x7235c663fa0>
pmb2.units=<pint.registry.UnitRegistry object at 0x7235c663fa0>
True

The SEED members are different, but the unit members are shared between the two objects (same memory address). These members differ in behavior because integers are immutable, while Pint objects are mutable (quick refresher). The rules for an object's mutability and scope can be a bit difficult to grasp, and while Python developers should know about them, we cannot expect them to know how the pyMBE class constructor uses these rules internally to manage the lifetime and ownership of its class members. I would advocate for the principle of least surprise by making these class members fully owned by the class instance, i.e. the constructed object, rather than by the class itself.

This can be achieved without any API change with the following rewrite:

import pint
class pymbe_library():
    def __init__(self, SEED):
        self.SEED = SEED
        self.units = pint.UnitRegistry()
        unit_length = 0.355 * self.units.nm
        self.units.define(f"reduced_length = {unit_length}")

Output of the MWE shows that each object has its own Pint unit registry:

pmb1.SEED=42
pmb2.SEED=43
pmb1.units=<pint.registry.UnitRegistry object at 0x106e81cd03a0>
pmb2.units=<pint.registry.UnitRegistry object at 0x106e81ca7790>
False

The same issue applies to the data frame. What happens if we create two class instances to load different parameter sets? The answer is not straightforward, because pandas sometimes returns copies, sometimes returns views, and sometimes allows data to be modified in-place.

Coming back to the unit system, we have another source of counter-intuitive behavior in pymbe_library.set_reduced_units():

https://github.com/pyMBE-dev/pyMBE/blob/37491e8c9625d90e4752bcc9705abcf7c3a9a990/pyMBE.py#L2511

Here is the relevant part of the method body:

https://github.com/pyMBE-dev/pyMBE/blob/37491e8c9625d90e4752bcc9705abcf7c3a9a990/pyMBE.py#L2528-L2544

We are creating a new unit registry. Which means we cannot pass a custom temperature as argument, because we end up multiplying two quantities whose unit belong to different registries:

import pint
units = pint.UnitRegistry()
Kb = 1.38064852e-23 * units.J / units.K
units = pint.UnitRegistry()
temperature = 298.15 * units.K
kT = temperature * Kb

Output:

Traceback (most recent call last):
ValueError: Cannot operate with Quantity and Quantity of different registries.

This exception is not necessarily raised by passing custom values to the other function arguments, but it might be raised later in the simulation script... It is simply not possible to safely pass custom values, because line 2528 guarantees that our custom values won't belong to the same registry as the one used by pyMBE.

Deleting line 2528 doesn't help, because when we call units.define(f"reduced_length = {unit_length}"), we end up re-defining the unit. I just found out the hard way how dangerous that is:

import pint
units = pint.UnitRegistry()

unit_length = 0.355 * units.nm
units.define(f"reduced_length = {unit_length}")
unit_length=units.Quantity(1,'reduced_length')
print(unit_length.to('nm'), "=", unit_length)

unit_length = 0.201 * units.nm
units.define(f"reduced_length = {unit_length}")
# units._build_cache() # <--- never forget this line when re-defining a custom unit!!!
unit_length=units.Quantity(1,'reduced_length')
print(unit_length.to('nm'), "=", unit_length)

Output:

0.355 nanometer = 1 reduced_length
0.355 nanometer = 1 reduced_length

The new reduced_length value is not used because the cache is out-of-date. See https://github.com/hgrecco/pint/issues/1773#issuecomment-1546765201 and the source code in hgrecco/pint@0.23:pint/facets/plain/registry.py#L517-L521 for more details. By default it should raise a warning, but in my environment (Python 3.10.12, Pint 0.23), the warnings don't get displayed in the terminal.

It would probably be safer to let the user define their own Pint unit registry, and pass it as a units argument of the pyMBE class constructor. In this way, the correct set up of the unit registry becomes the responsibility of the user rather than ours, and we reduce the risk of making a mistake, like the one I made above when re-defining units, or the one currently in pyMBE where the temperature cannot be set, or the one that used to be in the pyMBE constructor where the energy unit was silently ignored (https://github.com/pyMBE-dev/pyMBE/pull/55#discussion_r1601951839). If the user doesn't provide a units, we define a default one.

jngrad commented 2 months ago

Online discussion with @pm-blanco: move the unit registry creation to the class constructor, remove line 2528, and introduce units._build_cache() in set_reduced_units() to rebuild the cache when reduced units are redefined.