[Feature] Generate and compile Rust `struct` for Pydantic model at runtime and ahead of time (JIT and standard compilation)

pydantic / pydantic-core

Core validation logic for pydantic written in rust

MIT License

1.43k stars 242 forks source link

[Feature] Generate and compile Rust `struct` for Pydantic model at runtime and ahead of time (JIT and standard compilation) #402

Open caniko opened 1 year ago

caniko commented 1 year ago

Goal: (1) Make (pydantic-core ->) pydantic blazingly fast. (2) Make pydantic a dataclass interface for PyO3.

My previous issue made me consider generating Rust structs for Pydantic models. Rust doesn't support defining struct at runtime through macro, our only route is to generate the Rust code and compile it ourselves.

My solution Generate Rust code from a pydantic.BaseModel derived model (or model in short), and compile it at runtime during development. In production, there are several avenues: Compile automatically using a dedicated CLI the binaries would be outputed to the package. Another interesting avenue for compilation is to define these settings in pyproject.toml -> tool.pydantic:

[tool.pydantic]
targets = ["x86", "arm", "etc"]

The compilation would be once per model, and validation would be handled by Rust entirely.

I'd like to take this on later this year if someone else hasn't by that time (appreciate the credit if you use my idea). This should be discussed ahead of an implementation, so I am starting the discussion here, I think it is the right place to do so.

Considerations Definitely use something like rs codegen.

dmontagu commented 1 year ago

This would be pretty cool.

Just curious — are you familiar with any other crates, or especially PyO3 projects, doing a JIT thing like this with rust?

Also worth considering, are there security implications or other issues including a rust compiler in a production deployment image? I know people often likely to remove such things from their production images (even if only for reasons of size), also not sure if there might be a good way to JIT that wouldn't require full rustc.

samuelcolvin commented 1 year ago

Interesting idea, one I've thought about quite a lot already. @caniko, sorry you don't get all the credit :-)

If we were able to do this, it would make pydantic-core fast enough to be useful to other rust projects.

I believe https://crates.io/crates/jsonschema has some support for this, but I don't know how it works.

Also note the title is slightly misleading, It's a struct for a model, not BaseModel, I have another idea for a more marginal speedup by making BaseModel (or perhaps BaseBaseModel) a class defined in Rust, although that has other complexities - I did some thinking about this in #342.

@davidhewitt any ideas on this?

@mitsuhiko this is one of the things I was trying (badly) to describe at dinner after FOSDEM, do you have any ideas on how it might be done?

davidhewitt commented 1 year ago

I'm a little unclear on the use case, is this to generate a compiled model to use as a Python class, or to build Rust libraries defined as Python code ("useful to other rust projects")? Maybe there's inspiration to be taken here from something like mypyc?

I'd probably advise against using a rust compiler in production. Rust code usually has dependencies that you'll be fetching over the network via cargo, and also the compilation of all those deps may be uncached and take quite some time.

Avoiding doing this in production, ahead of time code generation from a "DSL" sounds a lot like a proc-macro? Maybe with some examples of what API / use case would be my opinions can be more useful 😄

caniko commented 1 year ago

I'm a little unclear on the use case, is this to generate a compiled model to use as a Python class

Yes, and no. This is for model validation, the model still has methods and properties defined by the downstream developer; these will remain as Python.

I'd probably advise against using a rust compiler in production. Rust code usually has dependencies that you'll be fetching over the network via cargo, and also the compilation of all those deps may be uncached and take quite some time.

also not sure if there might be a good way to JIT that wouldn't require full rustc

I agree, nothing should be compiled during production for Pydantic because that would require rust compiler + dependencies to be installed in the production environment. I am not sure if it would be a security risk, but probably, yes. In these cases, the JIT-like compilation would be for development only, the developer should use the interfaces we provide, and generate binaries for their package ahead of publishing/"going to production".

also the compilation of all those deps may be uncached and take quite some time.

Not an issue considering the preceding. If it still is a concern: I humbly disagree, please convince me.

Maybe there's inspiration to be taken here from something like mypyc?

Looking at the mypyc codegen module, I am not sure if this investigation would be worth the time considering our problem is much simpler.

sounds a lot like a proc-macro?

Yes, but with bangs. As an outsider to Rust codegen seemed aesthetically pleasing to me compared to coding everything from scratch. The package is used by several groups already, which I also took into consideration. I think taking codegen as a dependency is beneficial for both communities. TLDR: There is no good reason to re-invent the wheel.

mitsuhiko commented 1 year ago

@samuelcolvin no good ideas so far. I am however curious :)

davidhewitt commented 1 year ago

It sounds to me like you want the Python type object to become a Rust struct exposed to Python (essentially PyO3's #[pyclass])? The methods on this type object will remain implemented in Python?

To have this kind of structure you'd probably need to precompile a base class as the Rust type and then have the Python class you want be a subtype. The base class would just contain the validated contents as fields, so functionally it'd be a lot like a class with __slots__ where the instantiation is compiled in Rust.

Do we have evidence which would suggest that there's a performance advantage from doing this worth the complexity? I suppose the motivation is that the validation algorithm is then executing hardcoded rust code instead of a dynamic schema, and pydantic wouldn't need to generate the schema at startup? So this would mean pydantic-core would need a generic interface similar to how serde works...?

samuelcolvin commented 1 year ago

I think step one is to fork pydantic-core, implement a model as a rust struct and see what the performance change is.

I suspect thinking about it now, that the performance won't change that much, without rewriting all the validators. But I'd be happy to be proved wrong.

caniko commented 1 year ago

I suspect thinking about it now, that the performance won't change that much, without rewriting all the validators. But I'd be happy to be proved wrong.

I have to go through the validation implementation in pydantic-core to have my own opinion. Even if we optimize the validation to this architectural change, still no substantial speedup?

Additional consideration Consider FastAPI responses, ORM results, or any Pydantic model. These objects have no direct interface into Rust; I'd like to consider the additional features downstream from this. Could we consider Pydantic as a duplexed interface for models in Python and struct in Rust?

Designs for applications downstream from this idea (brainstormed):

Results from computed properties could also be stored in struct (if caching is turned on) use something like Box before they are computed. Thereby they would also be validated by the struct.
Frozen models could be made immutable using something like readonly. Computed fields would need a secondary struct in that case.
We could use serde to serialize Pydantic models because they are struct. Include computed properties as an option maybe?

The question is... Is there demand for this? I need something like this actually; however, my unique situation also allows me to go through pola.rs.

Overall it makes pydantic-core more adaptable towards Rust, which I see as a huge plus. And hopefully:

If we were able to do this, it would make pydantic-core fast enough to be useful to other rust projects.

caniko commented 1 year ago

Consider FastAPI responses, ORM results, or any Pydantic model. These objects have no direct interface into Rust; I'd like to consider the additional features downstream from this. Could we consider Pydantic as a duplexed interface for models in Python and struct in Rust?

@samuelcolvin, do you see any demand for a duplexed interface of this kind? Think of pyo3-polars, but for Pydantic.

This method would require downstream packages to be developed under maturin. Instead of replacing pydantic-core, I think that the struct feature must be an alternative to the current pydantic-core implementation.

A duplexed interface sounds very useful to me, but it is like the laser; didn't have much use at the time of its conception. There would be some speed gains; however, the duplex interface feature is probably a better selling-point.

PS: Good luck with the new company :rocket:

samuelcolvin commented 1 year ago

I'm afraid I don't understand that "duplexed" mens in this scenario.

On Thu, 2 Mar 2023, 12:58 Can H. Tartanoglu, @.***> wrote:

Consider FastAPI responses, ORM results, or any Pydantic model. These objects have no direct interface into Rust; I'd like to consider the additional features downstream from this. Could we consider Pydantic as a duplexed interface for models in Python and struct in Rust?

@samuelcolvin https://github.com/samuelcolvin, do you see any demand for a duplexed interface of this kind? Think of pyo3-polars https://github.com/pola-rs/pyo3-polars, but for Pydantic.

This method would definitely require the package to be developed under maturin. Instead of replacing pydantic-core, I think that this feature must be an alternative.

A duplexed interface sounds very useful to me, but it is like the laser; didn't have much use at the time of its conception.

— Reply to this email directly, view it on GitHub https://github.com/pydantic/pydantic-core/issues/402#issuecomment-1451827904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA62GGL3BLI7NAYTVOHLOMTW2CKO7ANCNFSM6AAAAAAU34DQPU . You are receiving this because you were mentioned.Message ID: @.***>

caniko commented 1 year ago

Duplex as in both directions. Pydantic objects in Python store their fields in a Rust struct. We can access its values both in Rust and Python; changes in one would change the values in the other.

Think of pyo3-polars, but for Pydantic.

Is a great example. In practice, we may effortlessly pass a response object from FastAPI to a Rust function as an argument. No, serialization or API required. Again, this would require an adapter crate on the Rust side in addition to the struct feature.