pyiron / uniton

BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

How to include ontology #6

Open samwaseda opened 1 day ago

samwaseda commented 1 day ago

I had a long discussion with @JNmpi this week, and we still haven't converged on this topic, but I think it's good to write it down to present the progress so far and also for me to organize my thoughts.

@JNmpi's idea is to do it via different files, like with file_path = onto/types/atomistic/structure.py to define:

from uniton.typing import u

def positions(var_type=np.ndarray[float], unit=“angstrom”, onto_id=None, shape=None):
    return u(var_type, unit, onto_id, shape)

Potentially this could also be a class. Anyway this can be used subsequently in functions via

from onto import types

def apply_displacement(positions: types.atomistic.structure.positions(
    unit="nanometer"
) ->  types.atomistic.structure.positions(unit="nanometer"):
    some_operation
    return modified_positions

This is actually pretty neat because it's super straightforward to implement [^1]. However, I have the feeling that the user should not have to write a data class and an ontological type separately. So we should be able to define the ontological type from the data classes:

from uniton.typing import u

@dataclass
class Atomistic:
    @dataclass
    class Structure:
        positions: u(np.ndarray[float], "angstrom")

Now obviously in my case the problem is that Atomistic.Structure would work as a type, but Atomistic.Structure.positions not, so I cannot write apply_displacement like above. One possibility is somehow to connect it to __annotations__, so that I could use Atomistic.Structure.__annotations__["positions"], which would be able to be understood by the unit converter from uniton, but I don't know how to do the distinction between classes and class attributes. Besides, at this point I don't think I have a possibility to change the units.

One possibility might be to pass it as a string, i.e.

def apply_displacement(positions: "Atomistic.Structure.positions") ->  "Atomistic.Structure.positions":
    some_operation
    return modified_positions

The string can then be parsed by the decorator. Frankly I'm really not sure if I like this idea.

[^1]: I don't know how to define just types.atomistic.structure to use it as a type

JNmpi commented 23 hours ago

@samwaseda, just a quick clarification. In my opinion, we only need either a dataclass or an ontological type, not both for the same object/variable. As an example, consider the following code snippets, which use the above ontological definition to construct a dataclass:

@dataclass
class AtomicStructure:
    positions: types.atomistic.structure.positions(unit="nanometer")
    cell: type.atomistic.structure.cell(unit="nanometer") # another on type to define
    some_other_parameters: type.atomistic.structure.some_other_parameters() # e.g. for species etc.

The idea is to follow the typical dataclass definition, which consists of variables with predefined types. Of course, such a variable can be a dataclass. To encourage reuse of the dataclasses and the ontological definitions by others, I prefer to have them in another file tree and not in the file where the function is defined.

samwaseda commented 22 hours ago

To encourage reuse of the dataclasses and the ontological definitions by others, I prefer to have them in another file tree and not in the file where the function is defined.

Not sure what you mean here, but maybe you are talking about something like this?

class DFTEnergyMinimization:
    electronic_energy_convergence: types.atomistic.energy()
    ionic_energy_convergence: types.atomistic.energy()

I have the feeling that this step is somewhat redundant, because there is nothing that defines the scientific context inside types.atomistic.energy() except for onto_id, because what it returns is merely u(float, "eV"). If types.atomistic.energy() itself should be the scientific context, then it has to be a class in order to be able to define a data type, i.e. types.atomistic.energy() has to return the class Energy, or u(Energy, "eV"). But then that's a totally different instance than just an energy value. So in the end I would probably rather write:

class DFTEnergyMinimization:
    electronic_energy_convergence: u(float, "eV", onto_id)
    ionic_energy_convergence: u(float, "eV", onto_id)

In this case onto_id would be repeated, but I don't really know why it's worse than repeating types.atomistic.energy().

This being said, when we disagree it's usually because there's something you are seeing that I don't see, so I'm gonna explore your option as well.

JNmpi commented 49 minutes ago

An ontological type should be unique for a physical quantity. So, taking your example data class my idea is to write it as follows:

  class DFTEnergyMinimization:
      electronic_energy_convergence: types.atomistic.convergence.dft.energy_error()
      ionic_energy_convergence: types.atomistic.convergence.energy_error()

We could then define another dataclass for energy minimization using non-DFT methods such as Lammps:

  class EnergyMinimization:
       ionic_energy_convergence: types.atomistic.convergence.energy_error()

Since we do not have electronic structure minimization for such methods, the corresponding quantities are omitted. In fact, with this more general formulation, we could write the corresponding DFT data class as

  class DFTEnergyMinimization(EnergyMinimization):
      electronic_energy_convergence: types.atomistic.convergence.dft.energy_error()

The important feature is that the two variables ionic_energy_convergence and electronic_energy_convergence belong to ontologically different types, i.e. trying to do something like ionic_energy_convergence = electronic_energy_convergence would give an error (or warning). In the same way that standard Python types or physical units ensure that the user does not erroneously connect input and output of inconsistent types or units, it extends this functionality to ontological types.