zeroSteiner / rule-engine

A lightweight, optionally typed expression language with a custom grammar for matching arbitrary Python objects.
https://zerosteiner.github.io/rule-engine/
BSD 3-Clause "New" or "Revised" License
455 stars 54 forks source link

[FEAT] Native pydantic support #91

Open dabdine opened 3 months ago

dabdine commented 3 months ago

Hola! Working with some rules now that I'm loading from markdown front-matter. To parse / validate the front-matter I'm using Pydantic (v2). During parsing, I wanted to natively transform a string field containing a rule-engine rule to a Rule class object. Here's the approach I took:

from typing import Any, Generic, Literal, Optional, TypeVar, cast

from pydantic import BaseModel, Field, GetCoreSchemaHandler, GetJsonSchemaHandler
from pydantic.json_schema import JsonSchemaValue
from pydantic_core import core_schema
from rule_engine import Context, DataType, Rule
from rule_engine.errors import SymbolResolutionError
from typing_extensions import get_args

SchemaType = TypeVar("SchemaType", bound=BaseModel)

class PydanticRule(Rule, Generic[SchemaType]):  # type: ignore
    """
    A class to store a Python `rule-engine` rule as a Pydantic model.
    """

    @classmethod
    def __get_pydantic_core_schema__(
        cls,
        _source_type: Any,
        _handler: GetCoreSchemaHandler,
    ) -> core_schema.CoreSchema:
        model_fields = cast(BaseModel, get_args(_source_type)[0]).model_fields

        def _python_to_rule_type(value: Any) -> DataType:
            # TODO: Handle additional datatypes, complex types (unions, etc.)
            try:
                # check if value is a literal
                if hasattr(value, "__origin__") and value.__origin__ is Literal:
                    return DataType.STRING
                return DataType.from_type(value)
            except TypeError:
                return DataType.UNDEFINED

        def resolve_pydantic_to_rule(field: str) -> DataType:
            if field not in model_fields:
                raise SymbolResolutionError(field)
            return _python_to_rule_type(model_fields[field].annotation)

        def validate_from_str(value: str) -> Rule:
            return Rule(
                value,
                context=Context(type_resolver=resolve_pydantic_to_rule),
            )

        from_str_schema = core_schema.chain_schema(
            [
                core_schema.str_schema(),
                core_schema.no_info_plain_validator_function(validate_from_str),
            ]
        )

        return core_schema.json_or_python_schema(
            json_schema=from_str_schema,
            python_schema=core_schema.union_schema(
                [
                    # check if it's an instance first before doing any further work
                    core_schema.is_instance_schema(Rule),
                    from_str_schema,
                ]
            ),
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda instance: str(cast(Rule, instance).text)
            ),
        )

    @classmethod
    def __get_pydantic_json_schema__(
        cls, _core_schema: core_schema.CoreSchema, handler: GetJsonSchemaHandler
    ) -> JsonSchemaValue:
        # Use the same schema that would be used for `int`
        return handler(core_schema.str_schema())

# example models
class OperatingSystem(BaseModel):
    vendor: str = Field(..., description="The vendor of the operating system", title="Operating system vendor")
    product: str = Field(..., description="The name of the operating system", title="Operating system name")
    family: Literal["linux", "windows", "macos"] = Field(
        ..., description="The family of the operating system", title="Operating system family"
    )
    version: Optional[str] = Field(
        None, description="The version of the operating system", title="Operating system version"
    )
    arch: Optional[str] = Field(
        None,
        description="The architecture of the operating system, (e.g. x86_64, x86, arm64)",
        title="Operating system architecture",
    )

class SomeModel(BaseModel):
    os: PydanticRule[OperatingSystem]

# define the rule that is read into the model
model = SomeModel.model_validate({"os": "vendor == 'Apple' and product == 'Mac OS X' and family == 'macos'"})

# test the rule against an input operating system
print(model.os.matches(OperatingSystem(vendor="Apple", product="Mac OS X", family="macos").model_dump()))

The PydanticRule takes a generic type parameter that is used to define the schema supplied to Context when Rule is instantiated. This allows the benefit of syntax/symbol error detection when the rule is compiled (read into the pydantic model) instead of at runtime.

I think it's a good idea to leave pydantic out of this lib (unless folks really need it). However, it may make sense to create a separate lib that contains the types so rule-engine can be used this way.

Also, we'd probably want to spend more time on the pydantic/python -> rule-engine type conversion. I haven't fully tested that yet.

zeroSteiner commented 3 months ago

That sounds neat. To clarify though, is there a specific ask here? I've not used Pydantic at all and I've only just recently started using type annotations in my Python code but have yet to start using them in this project in particular.

dabdine commented 3 months ago

The ask is to natively support Pydantic by adding the pydantic schema methods (__get_pydantic_core_schema__, __get_pydantic_json_schema__) to the Rule class, just throwing out thoughts that if that's implemented it's probably best to do it in a separate module so there isn't a dependency on Pydantic.

zeroSteiner commented 3 months ago

That sounds cool. I like the idea of offering optional integrations with other popular libraries. I'm planning on doing something similar with SQLAlchemy eventually so I can get type info for my ORM models.

Divjyot commented 3 months ago

@zeroSteiner Thanks for creating this lib and found how easy to write rule are in english. This might be slightly related to this PR, however I am looking to understand from regular user of Pydantic Classes:

I trying to write type_resolver for my Pydantic class which has fields that of types str, int, Enum, Nested-Pydantic such as:

class Person(BaseModel):
  name : str 
  gender : GenderEnum
  dob : datetime
  licence:LicenceModel

class LicenceModel(BaseModel):
  lic_number :  int
  lic_type : LicTypeEnum 

so, how can I create context for rule-engine for datatype Person.

context = rule_engine.Context( 
  resolver = rule_engine.resolve_attribute,
 type_resolver = rule_engine.type_resolver_from_dict({

        'name':     rule_engine.DataType.STRING,
        'gender':  ??,
        'licence':   ??,
        'dob':  rule_engine.DataType.DATETIME
    })

1.1 I tried to set type { "licence.lic_number" : rule_engine.DataType. FLOAT } however, compiling Rule('person.lic_number == 123') failed with error AttributeResolutionError

1.2 What type can be set for gender and licence ? If it not possible in current version, would it be the case that I should convert the Pydantic Model into purely dict and then determine the type of the gender and licence.

zeroSteiner commented 3 months ago

Unfortunately, there are two things that are unsupported by what you're trying to do.

  1. This ticket, there is no Pydantic integration and I don't know if I'll work on it because I don't use Pydantic. I might look into it to see how much work it'd be.
  2. There is no OBJECT data type for defining typed compound data. The existing compound types, e.g. ARRAY, MAPPING, and SET all require their member types to be the same. There isn't a ticket for this, but I am planning on implementing it in the next release, which will be v4.6. At that point, you'd at least be able to do this, albeit with a bit more effort because of the lack of Pydantic support. It'd likely involve defining a type like license_t = DataType.OBJECT('License', {'lic_number': DataType.FLOAT, 'lic_type': DataType.STRING}) and then person_t = DataType.OBJECT('Person', {'license': license_t, .... That's the plan anyways.