p2p-ld / numpydantic

Type annotations for specifying, validating, and serializing arrays with arbitrary backends in Pydantic (and beyond)
https://numpydantic.readthedocs.io/
MIT License
66 stars 1 forks source link

Roundtrip JSON serialization/deserialization #17

Closed arpit15 closed 1 month ago

arpit15 commented 2 months ago

Thanks for creating this essential missing piece from pydantic. I want to serialize a bunch of classes containing ndarray and deserialize them such they after initialization the class has elements with type ndarray. However, according to examples, that is not the supported behavior.
I am wondering if there is a way to get this behavior using numpydantic.

from numpydantic import NDArray
from pydantic import BaseModel

class MyModel(BaseModel):
    array: NDArray

myobj = MyModel(array=[1.0, 2.0])
json_s = myobj.model_dump_json()
loaded_obj = MyModel.model_validate_json(json_s)
assert isinstance(loaded_obj.array, np.ndarray)
sneakers-the-rat commented 2 months ago

Glad you are having fun :)

Yes. I am surprised that the validation logic is different when parsing json and consider the fact that this doesn't work a bug. Finishing something up for the night and will return tomorrow.

edit: it looks like there will have to be a change upstream in pydantic, will raise an issue with them tomorrow which will have more details in it

arpit15 commented 2 months ago

Thanks for quick response. Looking forward to reading the issue on pydantic. Hopefully there are other workarounds or quick solution which can be merged in pydantic.

arpit15 commented 2 months ago

An easy hack which works for me is add an Annotated type over NDArray

from typing import Annotated
from numpydantic import NDArray as _NDArray
from pydantic import BaseModel, AfterValidator
import numpy as np

NDArray = Annotated[_NDArray, AfterValidator(lambda x: np.array(x))]
class MyModel(BaseModel):
    array: NDArray

myobj = MyModel(array=[1.0, 2.0])
json_s = myobj.model_dump_json()
loaded_obj = MyModel.model_validate_json(json_s)
assert isinstance(loaded_obj.array, np.ndarray)
sneakers-the-rat commented 2 months ago

Ha, yes :) that should work, though we lose the ability to use the model with other array backends (shape and dtype validation should still work).

The basic problem is that when parsing json, pydantic-core just uses the json schema and not the python validators. The json schema is correct for an n-dimensional array in json (a list of lists, parametrised according to the shape and dtype constraints), so it validates, but we need a way to hook on the coercion parts of the array interfaces at the end of the json parsing. Im going to look further if there's a way to chain a validator for just the json validation, and if not we might have to do some more monkeypatching

arpit15 commented 2 months ago

Makes sense! I will be happy to test out your changes. Let me know if I can help in any way.

sneakers-the-rat commented 2 months ago

Just letting you know i've figured this out and will issue a patch tonight or tomorrow <3. simpler than i thought, just need to change the way we're generating the json schema on the NDArray class (which we will soon rewrite anyway to make a proper generic, but that's another issue).

edit: for more info - i had misunderstood how json_or_python_schema worked. Since __get_pydantic_core_schema__ receives the _source_type but __get_pydantic_json_schema__ doesn't, we generated the json schema then because that's when we have the shape and dtype values. But json_or_python_schema is what is making pydantic use the json schema when revalidating the json. If instead we just generate the json schema in __get_pydantic_json_schema__ it uses the python validator which correctly roundtrips.

arpit15 commented 2 months ago

@sneakers-the-rat thanks for the explanation. After your explanation, I kinda understand __get_pydantic_json_schema__ and __get_pydantic_json_schema__. It would be great to have to have it handled by your lib. I am wondering if there is a planned release cycle for numpydantic.

sneakers-the-rat commented 2 months ago

No planned release cycle, I just fix bugs as they come up and make enhancements as requested at this point, but I am using semver and will do appropriate deprecation warnings in the case of breaking changes. Next major planned version 2.0.0 will be to replace the basic NDArray type with a proper Generic with TypeVarTuple while keeping current behavior, and moving away from nptyping with full removal in 3.0.0, so plenty of warning.

Ill make this patch shortly

Making a note with a checklist item to

sneakers-the-rat commented 2 months ago

for ur consideration: https://github.com/p2p-ld/numpydantic/pull/20 docs: https://numpydantic.readthedocs.io/en/dump_json/serialization.html