Specify Datastructures instead of Methods

s9105947 commented 2 years ago

The README of PICMI states that "The goal of the standard is to propose a set (or dictionary) of names and definitions [...]". However, the documentation at no point specifies sets or dictionaries, it always specifies __init__() methods. Which variables are set by __init__() is entirely unspecified (and has to be looked up in the code).

For the most cases this does not matter, e.g. the attribute name of a Species is later accessible as species.name. For other cases this is not as clear: What properties does Simulation.add_species() modify? In which way? I am vary of just reading the code, b/c there are no guarantees made by PICMI that this code will be stable.

In addition to that the behavior of the __init__() method is not clear: Most only copy their parameters to the attributes, but some perform more operations, e.g.:

grid = picmi.Cartesian3DGrid(number_of_cells=[192, 2048, 12],
                             lower_bound=[0, 0, 0],
                             upper_bound=[4e-5, 4e-5, 2e-6],
                             lower_boundary_conditions=3 * ["open"],
                             upper_boundary_conditions=3 * ["open"])
# now grid.nx == 192
grid.nx = 1237
# How many cells does the x axis now have?

Or put another way: PICMI currently specifies a user interfaces, but the object that is passed to the implementing PIC simulation is unspecified.

There are a couple of approaches to this, I want to propose these steps:

Drop all methods from PICMI
Add specifications for attributes of classes
Auto-generate "typical" methods (__init__(), __eq__(), __hash__(), ...)
Add methods for convenience, but they only modify the previously defined attributes, e.g. add_species()
optional: Provide (protected) helper functions for implementations, e.g. derive mass&charge from particle type, replace None by default values where applicable etc.

I have attached the species definition as an example how I would imagine the specification to look like & and the test case that sketches its behavior.

This example uses a typing system. For that please refer to #58 . Semantic checks are entirely omitted. Please see #59 for that.

Implementation:

```python import typing # Idea: use decorator here class Species: """ describes a particle species For PICMI, a "particle species" is a set of particles, initialized from a common parameter set. Fundamental properties of the particle species are defined directly in a species object; more complex properties are delegated to other objects (which are stored as attributes). """ name: str = None """ name of this species Must be unique in the simulation. Can be used in the output to reference this specific species. """ particle_type: typing.Optional[str] = None """ Particle Name A string describing a single particle type as specified by the openPMD Species Type Extension. An implementation may derive additional properties from this attribute. If the particle_type is given, charge and mass may be omitted. Examples: He, electron """ mass: typing.Optional[float] = None """ mass of a single physical particle in kg MUST be None if particle_type is given. """ charge: typing.Optional[float] = None """ charge of a single physical particle in C MUST be None if particle_type is given. """ charge_state: typing.Optional[int] = None """ charge state of atoms The number of missing electrons of an (otherwise) neutral atom. MUST be None for particles that are not atoms. """ # Note: use forward type declaration (string) here initial_distribution: "Distribution" = None """ distribution at t=0 Describes the density in space of this species. To not initialize any particles of this species explicitly provide density which is globally zero. """ density_scale: float = 1.0 """ factor to be applied to initial_distribution Before applying the density according to initial_distribution, it is multiplied with the density_scale. This is intended to enable re-use a distribution objects. MUST be a float >0 """ particle_shape: typing.Optional[str] = None """ shape of the particles of this species MUST be one of "NGP", "linear", "quadratic", "cubic" Can be left at None, in which case the particle shape of the owning simulation object will be used. """ ```

Test case:

```python from .species import Species import unittest from .distribution import UniformDistribution import picmi import typing class TestSpecies(unittest.TestCase): def setUp(self): print("This test case is not exhaustive!!") self.uniform_density = UniformDistribution(density=1) def test_constructor_basic(self): """constructor sets attributes""" species = Species(name="myname", particle_type="electron", initial_distribution=self.uniform_density) self.assertEqual("myname", species.name) self.assertEqual("electron", species.particle_type) self.assertEqual(self.uniform_density, species.initial_distribution) def test_defaults(self): """attributes have defaults""" # note that this is not valid from a semantic point of view, and # automated checks of the constructor could make this test fail empty_implicit = Species() empty_explicit = Species(name=None, particle_type=None, mass=None, charge=None, charge_state=None, initial_distribution=None, density_scale=1.0, particle_shape=None) self.assertEqual(empty_explicit, empty_implicit) def test_constructor_codespecific(self): """code-specific params are okay, everything else is rejected""" picmi.codename = "dummypic" custom_allowed = Species(dummypic_favorite_icecream="Vanilla", dummypiclist=["my", "list", 1]) self.assertEqual("Vanilla", custom_allowed.dummypic_favorite_icecream) self.assertEqual(["my", "list", 1], custom_allowed.dummypiclist) with self.assertRaisesRegex(NameError, ".*irregular_name.*"): # note: detected even if value is None Species(irregular_name=None) def test_typechecking(self): """auto typeconversion/rejection""" # (1) types are automagically converted if possible species = Species(name=1) species.charge_state = "17" self.assertEqual(17, species.charge_state) self.assertEqual("1", species.name) # (2) unconvertible types are rejected with self.assertRaises(TypeError): # note this could *in theory* be converted, but is rejected b/c 0.3 # would be lost Species(charge_state=1.3) with self.assertRaises(TypeError): species.particle_shape = ["NGP"] def test_strict_attribute_checking(self): """unknown attributes are rejected""" # maybe this comment has to be replaced by some annotation class DummyPICSpecies(Species): dummypic_var: typing.Any picmi.codename = "dummypic" species = DummyPICSpecies() # known names ok: species.dummypic_var = 123 species.name = "abc" # unkown names not allowed: with self.assertRaisesRegex(NameError, ".*dummypic_anythingelse.*"): species.dummypic_anythingelse = 123 with self.assertRaisesRegex(NameError, ".*unkown_var.*"): species.unkown_var = None def test_hash_eq_generated(self): """methods __eq__() and __hash__() are auto-generated""" # non-exhaustive!! s1 = Species(name="abc") s2 = Species(name="abc") self.assertTrue(s1 is not s2) self.assertEqual(s1, s2) self.assertEqual(hash(s1), hash(s2)) s1.name = "def" self.assertNotEqual(s1, s2) self.assertNotEqual(hash(s1), hash(s2)) s2.name = "def" self.assertEqual(s1, s2) self.assertEqual(hash(s1), hash(s2)) # automated inheritance is prohibited (b/c children could add unkown # attributes) -> children should use a decorator (or sth similar) to # generate their own hash/eq methods class DummyPICSpecies(Species): pass d1 = DummyPICSpecies() d2 = DummyPICSpecies() with self.assertRaises(NotImplementedError): hash(d1) with self.assertRaises(NotImplementedError): d1 == d2 ```

Notably, this would be very close to how openPMD is specified, as purely a list of attributes. (Which would then allow separating the reference implementation and the specification more clearly, as mentioned in #3)

dpgrote commented 2 years ago

From the description, it sounds like most of the issue is not the PICMI code itself, but uncertainty about what data is available in the class instances and how to access it. Instead of rewriting the PICMI classes, an alternate solution is to provide documentation for the implementers describing what attributes are available and what they are (including type information). This would be a set of guidelines to follow. For instance, regarding the add_species method, the guidelines would state the the list of species set up by the user can be found in the Simulation class in the attribute species, which is a Python list. It sounds like documentation like this would go a long way toward solving the issues you bring up.

I agree that there can be an issue with have some data stored in two different ways, e.g. the number of grid cells. I have long learned that there is only so much that can be done to limit the damage when users abuse code since it is impossible to predict what users will do. In this case though, perhaps a small clean up can be done so that only one of the two alternatives is saved internally, for example only storing number_of_cells but not the individual nx etc. The interface was written allowing both options to provide flexibility for users.

s9105947 commented 2 years ago

It sounds like documentation like this would go a long way toward solving the issues you bring up.

Yes. Currently there are two interface to PICMI:

"PICMI-in": the stuff that the user writes when using PICMI ("input parameters")
"PICMI-out": the data structure created from that consumed by an implementation

Both are close, but subtly different. For implementations this would acceptable, but:

One of the high-regarded workflows for Smilei is referencing the input parameters when processing the output. In this case the user will use the same data structure as the implementing simulation, i.e. PICMI-out. Now the user writes PICMI-in, works with PICMI-out, and maybe doesn't even notice the difference until one of the subtle differences bites their back. Reducing PICMI to a set of arguments + auxiliary, convenient functions to modify these ensures that such differences are impossible to occur.

Note: This would also make serialization very simple, which would enable the process of reading PICMI during processing of output also when not working from a single python context.

Another short argument: The constructors are currently a lot of boilerplate code. (Admittedly, currently at an somewhat acceptable level.)

I believe that such a restructuring could be fully compatible to the current usage. If you're not opposed to the general idea I can hack together a fork for demonstration during the next week.

I have long learned that there is only so much that can be done to limit the damage when users abuse code since it is impossible to predict what users will do.

I'm young, naive, and feel like that we can push this "abusive zone" pretty far out of the way. (Ask me again in 10 years if I still think that.) Though I trust your experience here :)
For this specific case I'm still trying to come up with a clean solution, which does not require boilerplating properties. If I do, I'll send a PR.

picmi-standard / picmi

Specify Datastructures instead of Methods #60