Open samwaseda opened 1 day ago
@samwaseda, just a quick clarification. In my opinion, we only need either a dataclass or an ontological type, not both for the same object/variable. As an example, consider the following code snippets, which use the above ontological definition to construct a dataclass:
@dataclass
class AtomicStructure:
positions: types.atomistic.structure.positions(unit="nanometer")
cell: type.atomistic.structure.cell(unit="nanometer") # another on type to define
some_other_parameters: type.atomistic.structure.some_other_parameters() # e.g. for species etc.
The idea is to follow the typical dataclass definition, which consists of variables with predefined types. Of course, such a variable can be a dataclass. To encourage reuse of the dataclasses and the ontological definitions by others, I prefer to have them in another file tree and not in the file where the function is defined.
To encourage reuse of the dataclasses and the ontological definitions by others, I prefer to have them in another file tree and not in the file where the function is defined.
Not sure what you mean here, but maybe you are talking about something like this?
class DFTEnergyMinimization:
electronic_energy_convergence: types.atomistic.energy()
ionic_energy_convergence: types.atomistic.energy()
I have the feeling that this step is somewhat redundant, because there is nothing that defines the scientific context inside types.atomistic.energy()
except for onto_id
, because what it returns is merely u(float, "eV")
. If types.atomistic.energy()
itself should be the scientific context, then it has to be a class in order to be able to define a data type, i.e. types.atomistic.energy()
has to return the class Energy
, or u(Energy, "eV")
. But then that's a totally different instance than just an energy value. So in the end I would probably rather write:
class DFTEnergyMinimization:
electronic_energy_convergence: u(float, "eV", onto_id)
ionic_energy_convergence: u(float, "eV", onto_id)
In this case onto_id
would be repeated, but I don't really know why it's worse than repeating types.atomistic.energy()
.
This being said, when we disagree it's usually because there's something you are seeing that I don't see, so I'm gonna explore your option as well.
An ontological type should be unique for a physical quantity. So, taking your example data class my idea is to write it as follows:
class DFTEnergyMinimization:
electronic_energy_convergence: types.atomistic.convergence.dft.energy_error()
ionic_energy_convergence: types.atomistic.convergence.energy_error()
We could then define another dataclass for energy minimization using non-DFT methods such as Lammps:
class EnergyMinimization:
ionic_energy_convergence: types.atomistic.convergence.energy_error()
Since we do not have electronic structure minimization for such methods, the corresponding quantities are omitted. In fact, with this more general formulation, we could write the corresponding DFT data class as
class DFTEnergyMinimization(EnergyMinimization):
electronic_energy_convergence: types.atomistic.convergence.dft.energy_error()
The important feature is that the two variables ionic_energy_convergence
and electronic_energy_convergence
belong to ontologically different types, i.e. trying to do something like ionic_energy_convergence = electronic_energy_convergence
would give an error (or warning). In the same way that standard Python types or physical units ensure that the user does not erroneously connect input and output of inconsistent types or units, it extends this functionality to ontological types.
I had a long discussion with @JNmpi this week, and we still haven't converged on this topic, but I think it's good to write it down to present the progress so far and also for me to organize my thoughts.
@JNmpi's idea is to do it via different files, like with
file_path = onto/types/atomistic/structure.py
to define:Potentially this could also be a class. Anyway this can be used subsequently in functions via
This is actually pretty neat because it's super straightforward to implement [^1]. However, I have the feeling that the user should not have to write a data class and an ontological type separately. So we should be able to define the ontological type from the data classes:
Now obviously in my case the problem is that
Atomistic.Structure
would work as a type, butAtomistic.Structure.positions
not, so I cannot writeapply_displacement
like above. One possibility is somehow to connect it to__annotations__
, so that I could useAtomistic.Structure.__annotations__["positions"]
, which would be able to be understood by the unit converter fromuniton
, but I don't know how to do the distinction between classes and class attributes. Besides, at this point I don't think I have a possibility to change the units.One possibility might be to pass it as a string, i.e.
The string can then be parsed by the decorator. Frankly I'm really not sure if I like this idea.
[^1]: I don't know how to define just
types.atomistic.structure
to use it as a type