hgrecco / pint

Operate and manipulate physical quantities in Python
http://pint.readthedocs.org/
Other
2.4k stars 472 forks source link

Better annotations support #1166

Open hgrecco opened 4 years ago

hgrecco commented 4 years ago

With PEP560 we could now try to have a better annotations experience for Pint. Briefly, my proposal would be to do something like this

class Model:

    value: Quantity['m/s']

or

class Model:

    value: Quantity['[length]/[time]']

and the provide to a nice API to check for this.

What do you think?

hgrecco commented 4 years ago

and also these examples would be:

>>> @ureg.awrap
... def mypp(length: Quantity['meter]') -> Quantity['second]':
...     return pendulum_period(length)

and

>>> @ureg.acheck
... def pendulum_period(length: Quantity['[length]'):
...     return 2*math.pi*math.sqrt(length/G)

where awrapand acheck is are annotated equivalents to wrap and check.

jmuhlich commented 4 years ago

It would also be fantastic to have Mypy support for checking these annotations statically! Happy to contribute where I can.

dopplershift commented 4 years ago

I haven't started using annotations in MetPy (yet), so I don't have any practical experience to rely on to see any obvious gotchas. In general, though, those look reasonable.

hgrecco commented 3 years ago

I was playing with this concept. Some things to discuss:

  1. Can annotations be done with units (e.g. m/s) and dimensions (e.g. [length]/[seconds])? Yes. As there are valid use cases for both (e.g. wrapping vs checking)
  2. What is the output type of Quantity['m/s']?
    • A str? No.
    • A Quantity?
    • A new class (e.g. TypedQuantity)?
    • A UnitContainer or similar?
  3. Can we annotate with ureg.meter?
jules-ch commented 3 years ago

It would be nice to set the expected type magnitude should return either np.array, float or any supported types. It is sometimes confusing for the user to guess it when it is simply typed with Quantity.

Like collection types are doing like List[float] or Tuple[str, int], you then know what's inside.

claudiofinizio commented 3 years ago

Hello, I am writing a webapp for designing rural water supplies and I made extensive use of both pint and mypy. I would therefore be glad to contribute exposing Quantity to mypy.

My objective is to write code like follows:

class WaterPipeline:
    @property`
    def get_pipeline_pathlength(self): -> Quantity['length']
        """"""

In my case annotations should be done using dimensions: in the example above it is important to check that a Quantity['length'] is returned, but such length may be expressed in meters or kilometers.

I also agree with @jules-ch comment above about the expected type of magnitude.

dopplershift commented 3 years ago

@hgrecco NumPy has also been adding annotation support for ndarray inputs, so it would be important IMO to make sure whatever is done here compatible/sensible with that.

hgrecco commented 3 years ago

How about something like

But I would be more worried how to handle this.

jules-ch commented 3 years ago

type annotation of the magnitude should be the first thing we should target since Quantity type is a container just like List Tuple. Second should be unit or dimension. Just like you said @hgrecco, so something like

Quantity [type, unit] Quantity [type, dimension]

Type annotation for mypy usage, and we can have dimensions or unit telling the user which unit or dimension expect at first. And we can go further with checking unit or dimension at runtime.

hgrecco commented 3 years ago

For some internal projects, I have tried three different approaches to annotations for the output of Something[args] (where something is a class):

  1. an instance of another class. This is what python 3.9 does for containers. e.g. list[str] returns GenericAlias(list, str). Two options branch here: (1a) use GenericAlias or (1b) create a new class with extra methods.
  2. an instance of TypedSomething which is a subclass of Something and args are stored as instance variables.
  3. a new class (a different for every arg)

I would discourage (3) in pint but not so sure about the other two. Option 1a is the simple way to go but no so ergonomic. Option 1b is better, because new methods could be added to test for equivalence between annotations or if a given quantity satisfy an annotation.

Option 2 would allow for things like the following:

ScalarVelocityQ = Quantity[float, '[speed]']
q1 = ScalarVelocityQ(3, 'm/s')
q2 = ScalarVelocityQ(3, 's') # Exception is raised

In any case, I think we need to add good annotation introspection capability because we want to be able to evolve this without breaking everything. We need to avoid having to provide something like this https://stackoverflow.com/a/52664522/482819

jules-ch commented 3 years ago

We could take a look at https://docs.python.org/3/library/typing.html#typing.Annotated which describe what we want to achieve I think.

claudiofinizio commented 3 years ago

type annotation of the magnitude should be the fist thing we should target since Quantity type is a container just like List Tuple. Second should be unit or dimension. Just like you said @hgrecco, so something like

Quantity [type, unit] Quantity [type, dimension]

Type annotation for mypy usage, and we can have dimensions or unit telling the user which unit or dimension expect at first. And we can go further with checking unit or dimension at runtime.

Referring @jules-ch, in my opinion Quantity is not just a container. My perception: if I read somebody's code, I would like first to see if the return value of a function represents, say, a length, or energy, or pressure or whatever. Only after I would be interested to understand if that energy is, say, integer, float or some numpy type. Or at least, this is the way I see when "you first glance at somebody's code"

In short I think Quantity[dimension] should be the first info somebody looks for. Accordingly, I think "option 2" proposed by @hgrecco: ScalarVelocityQ = Quantity[float, '[speed]'] seems me the best approach.

tgpfeiffer commented 3 years ago

Just as a note, not sure how relevant it is to this issue: I tried to add type annotations to the python-measurement library a while ago, hoping that I could write something like l: Length = Length(2, "m") / 5 or v: Speed = Length(2, "m") / Time(1.5, "s") if there is an appropriate @overload annotation for Length.__div__. However, as I briefly summarized in https://github.com/coddingtonbear/python-measurement/issues/43#issuecomment-619821850 (enum item (3)) and also discussed in https://github.com/python/mypy/issues/4985#issuecomment-616979469, annotations for operators like __mul__ and __div__ are a bit trickier than for ordinary methods, because the resulting type of a * b is not only determined by the left operand's __mul__ method, but could also come from the right operand's __rmul__ method. As I wrote above, I'm not sure how relevant this is for annotating the Pint module, but you may hit this at some point, so I just wanted to leave a note here.

jules-ch commented 3 years ago

There are multiple use cases that we should address:

IMO the best option is :

Make Quantity Generic & use utilities class to return Annotated Types with PEP593 with Metadata that can be used for runtime checks.


T = TypeVar("T")
class Quantity(Generic[T],QuantityGeneric, PrettyIPython, SharedRegistryObject):
  ...

    @property
    def magnitude(self) -> T:
        """Quantity's magnitude. Long form for `m`"""
        return self._magnitude
  ...
    def __iter__(self) -> Iterator[T]:
  ...
    def to(self, other=None, *contexts, **ctx_kwargs) -> "Quantity[T]":

I tried something like this :


from typing import _tp_cache, _type_check
from typing import _AnnotatedAlias

class QuantityAlias(_AnnotatedAlias, _root=True):
    def __call__(self, *args, **kwargs):
        quantity = super().__call__(*args, **kwargs)

        if self.__metadata__:
            dim = quantity._REGISTRY.get_dimensionality(self.__metadata__[0])
            if not quantity.check(dim):
                raise TypeError("Dimensionality not matched")

        return quantity

class TypedQuantity:
    @_tp_cache
    def __class_getitem__(cls, params):
        from pint.quantity import Quantity
        msg = "TypedQuantity[t, ...]: t must be a type."
        origin = _type_check(Quantity[params[0]], msg)
        metadata = tuple(params[1:])
        return QuantityAlias(origin, metadata)

Here we make a simple check at runtime for dimension just like @hgrecco example.

So TypedQuantity[float, "[length]"] will be translated to Annotated[Quantity[float], "length"]

We could go further like it is done here https://docs.python.org/3/library/typing.html#typing.Annotated.

We could translate to something like Annotated[Quantity[float], DimensionCheck("length")].

Those metadata can be added to the instance if needed.

I'll try to Draft a PR.

hgrecco commented 3 years ago

@jules-ch I really like your proposal. I am eager to see the draft PR. Great discussion everybody!

jamesbraza commented 3 years ago

I would like to make a plug within my company's software team to use pint for units. Having typing is a huge plus.

I see https://github.com/hgrecco/pint/pull/1259 was merged, is that the only PR needed for typing, or is there more work to be done? When do you think a release will be cut that incorporates that PR?

jules-ch commented 3 years ago

We'll make 0.18 release soon, prob end of the month.

pint typing support will be experimental at first, I still need to document it. I'll push for a new version of documentation, just haven't got the time lately.

nunupeke commented 2 years ago

Hi. I'm currently experimenting with the new typing features in v0.18 (#1259). How would I annotate functions or classes that handle float / np.ndarray equivalently to Quantity[float] / Quantity[np.ndarray]. For example, how would I annotate the following generic function correctly:

from typing import TypeVar
import numpy as np
from pint import Quantity

A = TypeVar('A', np.ndarray, Quantity[np.ndarray])

def get_index(array: A, i: int) -> ???:
    return array[i]

I am aware that the same is relatively straightforward for example for lists,

from typing import TypeVar, List

T  = TypeVar('T')

def get_index(l: List[T], i: int) -> T:
    return l[i]

but I'm having a hard time translating it to the pint.Quantity context.

tgpfeiffer commented 2 years ago

I think you'd need to use numpy.typing.NDArray[X] rather than numpy.ndarray and then you can return X, see https://stackoverflow.com/a/68817265/3663881 (although array[i] could be something else than X if array is a higher-dimensional array; I guess we need to wait for shape support in numpy.typing before you can actually write that safely).

nunupeke commented 2 years ago

Ok, you are right. My example function is not ideal. What I was really trying to find is an annotation that says: "if you use numpy arrays here, expect scalars there" and equivalently "if you use array quantites here, expect scalar quantities there" or vice versa. Another example:

from typing import TypeVar, Generic
import numpy as np
from pint import Quantity

A = TypeVar('A', np.ndarray, Quantity[np.ndarray])

class Converter(Generic[A]):
    def __init__(self, scale: "float in case A is np.ndarray / Quantity[float] in case A is Quantity[np.ndarray]"):
        self.scale = scale

    def convert(self, array: A) -> A:
        return A/self.scale
tgpfeiffer commented 2 years ago

I see. I think in that case you are looking for typing.overload, there you can have multiple annotations for the same function that specify further what goes in and out.

For the function you are implementing I think you will need a type annotation like

def get_index(array: Union[np.ndarray, Quantity[np.ndarray]], i: int) -> Union[float, Quantity[float]]:
    return array[i]

but as you write that's not specific enough, a mypy run on

data = np.asarray([3., 4.])
data_q = Q_(data, 'meter')

reveal_type(get_index(data, 0))
reveal_type(get_index(data_q, 0))

prints

test.py:20: note: Revealed type is "Union[builtins.float, pint.quantity.Quantity[builtins.float]]"
test.py:21: note: Revealed type is "Union[builtins.float, pint.quantity.Quantity[builtins.float]]"

If you add @overload declarations like

@overload
def get_index(array: np.ndarray, i: int) -> float: ...

@overload
def get_index(array: Quantity[np.ndarray], i: int) -> Quantity[float]: ...

then mypy prints

test.py:20: note: Revealed type is "builtins.float"
test.py:21: note: Revealed type is "pint.quantity.Quantity[builtins.float]"
MichaelTiemannOSC commented 2 years ago

I'm now suddenly interested in this. We have data providers handing us a mis-mash of TWh and PJ energy generation data and we'd like to keep our units straight. We are also using Pydantic. My first attempt to add a Quantity field resulted in this error message (using Pint 0.18):

TypeError: Fields of type "<class 'pint.quantity.Quantity'>" are not supported.

Worked around by adding

    class Config:
            arbitrary_types_allowed = True

to the models I'm enhancing with Quantity.

shimwell commented 2 years ago

Super interested in the use of Pint type hinting with Pydantic types.

Wondering if you were able to add something like PositiveFloat or other Pydantic types to your example @MichaelTiemannOSC


from pydantic import BaseModel, PositiveFloat
from pint import Quantity

class PowerPlant(BaseModel):
    power_generation: Quantity['watt']
    class Config:
        arbitrary_types_allowed = True

noor_solar = PowerPlant(power_generation=Quantity(160, 'megawatt'))

noor_solar.power_generation
MichaelTiemannOSC commented 2 years ago

Should be able to share some findings soon. I have an issue filed with pandas to sort out an ExtensionArray problem (https://github.com/pandas-dev/pandas/issues/45240) and am working with some smart people (copied) on how to make this play well with both database connectors and REST APIs.

@erikerlandson @caldeirav @joriscram

jules-ch commented 2 years ago

@hgrecco astropy introduced something similar that we can implement, using Annotated typing that I outlined in previous comments.

https://github.com/astropy/astropy/commit/0deb5c545b5b1fe47361ed5a02a86fe9ef16d3ec

deeplook commented 2 years ago

Really curious on any progress on this as I'm getting into this very topic and have some ugly workarounds like:

from pydantic import BaseModel, validator
from pint import Quantity

ureg = pint.UnitRegistry()

class MyModel(BaseModel):
    distance: str

    @validator("distance")
    def is_length(cls, v):
        q = ureg.Quantity(v)
        assert q.check("[length]"), "dimensionality must be [length]"
        return q
>>> MyModel(distance="2 ly").distance
2 light_year
mcleantom commented 2 years ago

Really curious on any progress on this as I'm getting into this very topic and have some ugly workarounds like:

from pydantic import BaseModel, validator
from pint import Quantity

ureg = pint.UnitRegistry()

class MyModel(BaseModel):
    distance: str

    @validator("distance")
    def is_length(cls, v):
        q = ureg.Quantity(v)
        assert q.check("[length]"), "dimensionality must be [length]"
        return q
>>> MyModel(distance="2 ly").distance
2 light_year

I made a quick, slightly nicer, workaround based off your workaround

from pydantic import BaseModel
import pint

class PintType:
    Q = pint.Quantity

    def __init__(self, q_check: str):
        self.q_check = q_check

    def __get_validators__(self):
        yield self.validate

    def validate(self, v):
        q = self.Q(v)
        assert q.check(self.q_check), f"Dimensionality must be {self.q_check}"
        return q

Length = PintType("[length]")

class MyModel(BaseModel):
    distance: Length

    class Config:
        json_encoders = {
            pint.Quantity: str
        }
deeplook commented 2 years ago

I made a quick, slightly nicer, workaround based off your workaround

Indeed, thanks!

sanbales commented 2 years ago

Thank you all for posting this, it has been incredibly helpful.

One thing I had to mention is that I was having issues with the example above because the fields were objects and not classes, so I tweaked things a bit to support jsonschema output and assignment validation.

Here is a public gist with a more complete example.

Open to any suggestions on how to improve this:

from pint import Quantity, Unit, UnitRegistry
from pydantic import BaseModel

registry = UnitRegistry()

schema_extra = dict(definitions=[
    dict(
        Quantity=dict(type="string"),
    )
])

def quantity(dimensionality: str) -> type:
    """A method for making a pydantic compliant Pint quantity field type."""

    try:
        registry.get_dimensionality(dimensionality)
    except KeyError:
        raise ValueError(f"{dimensionality} is not a valid dimensionality in pint!")

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value):
        quantity = Quantity(value)
        assert quantity.check(cls.dimensionality), f"Dimensionality must be {cls.dimensionality}"
        return quantity

    @classmethod
    def __modify_schema__(cls, field_schema):
        field_schema.update(
            {"$ref": "#/definitions/Quantity"}
        )

    return type(
        "Quantity",
        (Quantity,),
        dict(
            __get_validators__=__get_validators__,
            __modify_schema__=__modify_schema__,
            dimensionality=dimensionality,
            validate=validate,
        ),
    )

class MyModel(BaseModel):

    distance: quantity("[length]")
    speed: quantity("[length]/[time]")

    class Config:
        validate_assignment = True
        schema_extra = schema_extra
        json_encoders = {
            Quantity: str,
        }
model = MyModel(distance="1.5 ly", speed="15 km/hr")
model
>>> MyModel(distance=<Quantity(1.5, 'light_year')>, speed=<Quantity(15.0, 'kilometer / hour')>)

# check the jsonschema, could make the definition for Quantity better...
print(MyModel.schema_json(indent=2))
>>> {
  "title": "MyModel",
  "type": "object",
  "properties": {
    "distance": {
      "$ref": "#/definitions/Quantity"
    },
    "speed": {
      "$ref": "#/definitions/Quantity"
    }
  },
  "required": [
    "distance",
    "speed"
  ],
  "definitions": [
    {
      "Quantity": {
        "type": "string"
      }
    }
  ]
}

# convert to a python dictionary
model.dict()
>>> {'distance': 1.5 <Unit('light_year')>, 'speed': 15.0 <Unit('kilometer / hour')>}

# serialize to json
print(model.json(indent=2))
>>> {
  "distance": "1.5 light_year",
  "speed": "15.0 kilometer / hour"
}

import json

# load from json
MyModel.parse_obj(json.loads(model.json()))
>>> MyModel(distance=<Quantity(1.5, 'light_year')>, speed=<Quantity(15.0, 'kilometer / hour')>)

# test that it raises error when assigning wrong quantity kind
model.distance = "2 m/s"

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
Cell In [14], line 1
----> 1 model.distance = "2 m/s"

File C:\mf\envs\jafte\lib\site-packages\pydantic\main.py:385, in pydantic.main.BaseModel.__setattr__()

ValidationError: 1 validation error for MyModel
distance
  Dimensionality must be [length] (type=assertion_error)
MichaelTiemannOSC commented 2 years ago

@sanbales that was incredibly helpful code! I'm now trying to build a production_quantity function that validates that a given Quantity is among the types of quantities that we deal with in "production". I have written this:

schema_extra = dict(definitions=[
    dict(
        Quantity=dict(type="string"),
        ProductionQuantity=dict(type="List[str]"),
    )
])

class ProductionQuantity(BaseModel):

    dims_list: List[str]

    @validator('dims_list')
    def units_must_be_registered(cls, v):
        for d in v:
            try:
                registry.get_dimensionality(d)
            except KeyError:
                raise ValueError(f"{d} is not a valid dimensionality in pint!")
        return v

    class Config:
        validate_assignment = True
        schema_extra = schema_extra
        json_encoders = {
            Quantity: str,
        }

def production_quantity(dims_list: List[str]) -> type:
    """A method for making a pydantic compliant Pint production quantity."""

    try:
        for dimensionality in dims_list:
            registry.get_dimensionality(dimensionality)
    except KeyError:
        raise ValueError(f"{dimensionality} is not a valid dimensionality in pint!")

    @classmethod
    def __get_validators__(cls):
        yield cls.validate

    @classmethod
    def validate(cls, value):
        quantity = Quantity(value)
        for dimensionality in cls.dims_list:
            if quantity.check(dimensionality):
                return quantity
        raise DimensionalityError(value.units, f"in [{cls.dims_list}]")

    @classmethod
    def __modify_schema__(cls, field_schema):
        field_schema.update(
            {"$ref": "#/definitions/ProductionQuantity"}
        )

    return type(
        "ProductionQuantity",
        (ProductionQuantity,),
        dict(
            __get_validators__=__get_validators__,
            __modify_schema__=__modify_schema__,
            dims_list=dims_list,
            validate=validate,
        ),
    )

But pydantic gives me this error, which I haven't been able to fully grok:

TypeError: The type of ProductionQuantity.dims_list differs from the new default value; if you wish to change the type of this field, please use a type annotation

MichaelTiemannOSC commented 2 years ago

I got past that error by changing the dims_list=dims_list to f"List[str] = {dims_list}" (based on a reading of https://github.com/pydantic/pydantic/issues/757#issuecomment-522265595

I'm still working out some other bits, so please don't take the above as correct reference code. It's more a reference to my current state of problems than a solution.

sanbales commented 2 years ago

Thanks @MichaelTiemannOSC , I was not familiar with the pydantic check for redefined field types, this is good to know! I didn't intend my code to be the correct reference code either, but it'd be nice to have a well defined way of integrating pint with pydantic. Appreciate you looking into it and sharing your code. I've gone back and updated that gist a few more times since I posted this, trying to make it a bit cleaner, but it feels like it could be simplified further.

MichaelTiemannOSC commented 2 years ago

Cool. Here's a link to the code repository where I'm bringing together Pint, pydantic, uncertainties, and pandas: https://github.com/MichaelTiemannOSC/ITR/tree/template-v2

edelmanjm commented 3 months ago

Sorry to neco this thread, but what's the status on type hints? The previously linked repo appears to be gone.

MichaelTiemannOSC commented 3 months ago

It has since been merged into the main repository: https://github.com/os-climate/ITR. Not that this repository doesn't itself contain pandas, pint, or pint-pandas. I have created some local versions of those, but things have drifted as uncertainties proved more challenging to bring into pint-pandas than expected.

uellue commented 2 months ago

Thank you for the examples! Here's an example on how to make it work with annotations in Pydantic 2: https://github.com/LiberTEM/LiberTEM-schema/blob/c096d5337f21c78232134ad9d9af19b8405b1992/src/libertem_schema/__init__.py#L1

(edit: inline code here)

from typing import Any, Sequence

from typing_extensions import Annotated
from pydantic_core import core_schema
from pydantic import (
    BaseModel,
    GetCoreSchemaHandler,
    WrapValidator,
    ValidationInfo,
    ValidatorFunctionWrapHandler,
)

import pint

__version__ = '0.1.0.dev0'

ureg = pint.UnitRegistry()

class DimensionError(ValueError):
    pass

_pint_base_repr = core_schema.tuple_positional_schema(items_schema=[
    core_schema.float_schema(),
    core_schema.str_schema()
])

def to_tuple(q: pint.Quantity):
    base = q.to_base_units()
    return (float(base.magnitude), str(base.units))

class PintAnnotation:
    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:
        return core_schema.json_or_python_schema(
            json_schema=_pint_base_repr,
            python_schema=core_schema.is_instance_schema(pint.Quantity),
            serialization=core_schema.plain_serializer_function_ser_schema(
                to_tuple
            ),
        )

_length_dim = ureg.meter.dimensionality
_angle_dim = ureg.radian.dimensionality
_pixel_dim = ureg.pixel.dimensionality

def _make_handler(dimensionality: str):
    def is_matching(
                q: Any, handler: ValidatorFunctionWrapHandler, info: ValidationInfo
            ) -> pint.Quantity:
        # Ensure target type
        if isinstance(q, pint.Quantity):
            pass
        elif isinstance(q, Sequence):
            magnitude, unit = q
            # Turn into Quantity: measure * unit
            q = magnitude * ureg(unit)
        else:
            raise ValueError(f"Don't know how to interpret type {type(q)}.")
        # Check dimension
        if not q.check(dimensionality):
            raise DimensionError(f"Expected dimensionality {dimensionality}, got quantity {q}.")
        # Return target type
        return q

    return is_matching

Length = Annotated[
    pint.Quantity, PintAnnotation, WrapValidator(_make_handler(_length_dim))
]
Angle = Annotated[
    pint.Quantity, PintAnnotation, WrapValidator(_make_handler(_angle_dim))
]
Pixel = Annotated[
    pint.Quantity, PintAnnotation, WrapValidator(_make_handler(_pixel_dim))
]

class Simple4DSTEMParams(BaseModel):
    '''
    Basic calibration parameters of a strongly simplified model
    of a 4D STEM experiment.

    See https://github.com/LiberTEM/Microscope-Calibration
    and https://arxiv.org/abs/2403.08538
    for the technical details.
    '''
    overfocus: Length
    scan_pixel_pitch: Length
    camera_length: Length
    detector_pixel_pitch: Length
    semiconv: Angle
    cy: Pixel
    cx: Pixel
    scan_rotation: Angle
    flip_y: bool

Usage from https://github.com/LiberTEM/LiberTEM-schema/blob/c096d5337f21c78232134ad9d9af19b8405b1992/tests/test_schemas.py#L1

def test_smoke():
    params = Simple4DSTEMParams(
        overfocus=0.0015 * ureg.meter,
        scan_pixel_pitch=0.000001 * ureg.meter,
        camera_length=0.15 * ureg.meter,
        detector_pixel_pitch=0.000050 * ureg.meter,
        semiconv=0.020 * ureg.radian,
        scan_rotation=330. * ureg.degree,
        flip_y=False,
        cy=(32 - 2) * ureg.pixel,
        cx=(32 - 2) * ureg.pixel,
    )
    as_json = params.model_dump_json()
    pprint.pprint(("as json", as_json))
    from_j = from_json(as_json)
    pprint.pprint(("from json", from_j))
    res = Simple4DSTEMParams.model_validate(from_j)
    pprint.pprint(("validated", res))
    assert isinstance(res.overfocus, Quantity)
    assert isinstance(res.flip_y, bool)
    assert res == params

To be figured out:

Is this useful? If yes, what would be a good way to make it easily available to others?

CC @sk1p

blakeNaccarato commented 2 months ago

@uellue

Is this useful?

Yes!

If yes, what would be a good way to make it easily available to others?

I'm reminded of the project organization and code layout of https://github.com/p2p-ld/numpydantic, which exposes NumPy array shape validation as a Pydantic annotation similarly to how your code customizes Pint types. Could be useful to adopt some of its structure if you were to wrap up the Pint validator as its own pip installable thing!

Ping me if you need any pointers regarding PyPI packaging/release workflows (edit: Ah, I see over at LiberTEM you've already got release workflows down).

Edit

  • [ ] Allow JSON validation of dimensionality, possibly by enforcing SI base units as string constants?

I initially linked numpydantic above as an example for project layout for simple distribution of your annotated Pint types, but the abstract parent class numpydantic.interface.Interface may actually be directly useful in implementing some of your Pint semantics, as it handles general numeric types (e.g. Python floats/ints but also NumPy types).

Numpydantic is stable at 1.0, but the "coming soon" goals of general metadata and extensibility may make it easier to implement some of these Pint needs. I would say a standalone package that exposes the simple validator is the closer reach, then using the metadata/extensible bits of Numpydantic in the future for robustness without having to re-implement a bunch of machinery.

uellue commented 2 months ago

Good to hear that you like it! :-)

I'm reminded of the project organization and code layout of https://github.com/p2p-ld/numpydantic, which exposes NumPy array shape validation as a Pydantic annotation similarly to how your code customizes Pint types. Could be useful to adopt some of its structure if you were to wrap up the Pint validator as its own pip installable thing!

Oh, that one looks interesting! Indeed, this one could be a good template and address the magnitude portion.

Numpydantic is stable at 1.0, but the "coming soon" goals of general metadata and extensibility may make it easier to implement some of these Pint needs. I would say a standalone package that exposes the simple validator is the closer reach, then using the metadata/extensible bits of Numpydantic in the future for robustness without having to re-implement a bunch of machinery.

Hm ok, probably good to experiment a bit and explore options before releasing 0.1. It feels like numpydantic as-is can provide the magnitude, and the units are orthogonal to it, similar to how pint handles it, right? That would mean the schema composition "magnitude + units" makes sense as a separate package and not part of numpydantic or pint.

uellue commented 1 month ago

Here's a version that integrates with numpydantic. Thank you for the pointer!

https://github.com/LiberTEM/LiberTEM-schema/pull/7

uellue commented 1 month ago

LiberTEM/LiberTEM-schema#7

Would the machinery make sense as part of pint, by the way? One could put it into pint.schema, for example. Would mostly require documentation, possibly more tests, and probably more default type definitions like Speed, Weight etc.