pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.39k stars 1.97k forks source link

Serializable expressions with Pydantic #12172

Open thomasaarholt opened 1 year ago

thomasaarholt commented 1 year ago

Description

In short, I'd really like this to work to enable some nice features over at patito.

import polars as pl
from pydantic import BaseModel
from pydantic.config import ConfigDict

class Foo(BaseModel):
    constraints: list[pl.Expr] = [(pl.col("even_int") % 2 == 0).alias("even_constraint")]

    model_config: ConfigDict = {"arbitrary_types_allowed":True}

Foo().model_dump_json()
# PydanticSerializationError: Unable to serialize unknown type: <class 'polars.expr.expr.Expr'>

I see that there is a expr.meta.write_json method, but that just writes to file, not to a python object. If we in polars implemented the two dunder methods __get_pydantic_core_schema__ and __get_pydantic_json_schema__ (docs here), we could feed it a custom json serializer and deserializer, and pydantic would be able to infer it automatically.

Is this something we could consider implementing? The alternative, as far as I can see is for us to implement a custom PatitoExpr using Annotated and the write_json method to a temporary file. But that requires our users to always use PatitoExpr when they would just want to use Expr.

alexander-beedie commented 1 year ago

If the expr.meta.write_json target is None It writes to string, not a file; is that more helpful?

expr = (pl.col('x') > 0) * 0.1
expr.meta.write_json(None)
# {
#   'BinaryExpr': {
#     'left': {'BinaryExpr': {'left': {'Column':'x'}, 'op':'Gt', 'right': {'Literal': {'Int32':0}}}},
#     'op': 'Multiply',
#     'right': {'Literal': {'Float64':0.1}}
#   }
# }