Open NowanIlfideme opened 2 years ago
Thanks for the question.
I think this is just calling validate_python
, or indeed initializing the pydantic model as I guess you do now.
No changes should be required to pydantic-core to allow this.
I want to add support for line numbers in errors, but that requires a new rust JSON parser, so won't be added until v2.1 or later.
Closing, but feel free to ask if you have more questions.
To be clear, I would love this to be possible, but I don't want to have to add the capability to parse more formats to pydantic-core, so the only way this would be possible would be to achieve runtime linking of pydantic-core and the third party libraries that perform the parsing.
This is the only way I think of that it might work:
(Note 1: there's probably a better way to do this, I'm not an expert at this stuff) (Note 2: this might not work) (Note 3: I'm not even convinced this is a good idea and I don't promise to add this functionality)
With that out of the way, here's a very rough idea:
JsonInput
(new name required)JsonInput
which makes it available in as a python object (not actual to the dict etc. from JsonInput
, just a way to return a pointer to JsonInput
back to python landSchemaValidator
, pydantic-core then extracts the JsonInput
and validates it with the same logic it uses now for json datapydantic-core-json-input
crate, perform the logic of building JsonInput
s in rust, then return them to python world to in turn be passed to pydantic-reWith this approach, while we go "via python", we never have to do the hard work to convert the JsonInput
to a python object.
i am coming here from here
and i want to share my ideas how to runtime pluging of should work....
my idea is to do it kind of similar to dll
you would have to have your pydantic-core plu-ins in one preagreed location (compiled) (or registered somehow) so we know where to load them from and we can avoid "dll hell" and on startup of pydantic-core you would say... hey load this and that of that version (no multiple version alowed for one instance of pydantic core (globaly you could have them but in app i think it would create confusion)
during deserialization you would just say what serializer to use... (format is not enought since there are more then one serializer/deserializer availible for one format (like simd json))
i am also not expert in this but i think there should be folowing requirements
1) more then one serializer allowed and allow to deserialize to one class from multiple format 2) more then one serialzier per format (chosen by name ?) 3) serializer can be chosen dinamicaly. 4) (not mandatory) only chosen serialzer are loaded (to limit load time of libary) 5) hard fail if serializer missing or is incompatible
why i am suggesting deoupeling...
if this is too hard i would just suggest to do in on compile time but that may be too complicated...
This sounds very hard to do in a reliable cross-platform way. Given the problems we already experience (at the scale of pydantic or watchfiles) with very subtle OS differences and wheels. I'm very unwilling to enter into this mess.
You're effectively proposing another way link libraries that side steps both crates and python imports, are there any other examples of python libraries that use DLL/ share libraries to access functionality in other packages without using the python runtime?
(Perhaps worth remembering that I'll probably be maintaining pydantic for the next decade at least, one "clever" idea proposed now which relies on shaky cross-platform behaviour, could cost me 100s of hours of support over the years - hence my caution)
I real question is how much faster would this be than my proposal above?
To proceed with the conversation, we need to benchmark that. Really someone need to build both solutions (and any 3rd solutions proposed) and see how they perform.
@PrettyWood @tiangolo do you have any feedback on any of this?
well loading crates may be fine as well... but as said i am not expert....
crates would need to be a compile time dependency, so distributed wheels couldn't be used.
ah.... yea... i forgot about that..... because i would just force you to compile when you install library...
however if there are little to no perfomance impatcs for @samuelcolvin solution i would be also fine.
However there are people "needing" SIMDjson... and in extreame cases perfomance may degrade
If you care about "extreme performance", don't use python, build the whole thing in Rust, Go or C.
Sorry for missing this discussion 2 weeks ago...
I need to check out and play with the current (v0.3.1) version of pydantic-core before I can really give an informed opinion, but from a cursory glance it seems that validate_python()
should be enough to implement in Python-land.
Regarding Rust-side implementation, I think that it all sounds too messy for a Python-facing library. "Config parsing" use cases don't require cutting edge performance anyways - you generally parse a single YAML file at the beginning of a script (vs many JSON API requests/sec). And YAMLs aren't usually passed between (performance-critical) applications since parsing YAML is slower anyways. There's similar considerations with TOML. I guess the most JSON-like thing would be XML derivatives, but I don't have much experience there, and haven't encountered anyone using Pydantic for XML yet 😉
I agree, validate_python
is enough for everything except performance critical applications.
The only other thing you might need is line numbers, that's one of the main drivers (for me) of #10.
We need to think about how to make this possible without adding complexity or damaging performance.
Hi, author of pydantic-yaml here. I have no idea about anything Rust-related, unfortunately, but hopefully this feature request will make sense in Python land.
I'm going off this slide in this presentation by @samuelcolvin, specifically:
Here's a relevant discussion about "3rd party" deserialization from v1: https://github.com/samuelcolvin/pydantic/discussions/3025
It would be great if
pydantic-core
were built in a way where non-JSON formats could be added "on top" rather than necessarily being built into the core. I understand performance is a big question in this rewrite, so ideally these would be high-level interfaces that can be hacked in Python (or implemented in Rust/etc. for better performance).From the examples available already, it's possible that such a feature could be quite simple on the
pydantic-core
side - the 3rd party would create their own function a-lavalidate_json
, possibly just callingvalidate_python
. However, care would be needed on how format-specific details are sent betweenpydantic
and the implementation. In V1 this is done with theConfig
class and specialjson_encoder/decoder
attributes, which have been a pain to re-implement for YAML properly (without way too much hackery).Ideally for V2, this would be something more easily addable and configurable. The alternative would be to just implement TOML, YAML etc. directly in the binary (and I wouldn't have to keep supporting my project, ha!)
Thanks again for Pydantic!