zeroSteiner / rule-engine

A lightweight, optionally typed expression language with a custom grammar for matching arbitrary Python objects.
https://zerosteiner.github.io/rule-engine/
BSD 3-Clause "New" or "Revised" License
433 stars 54 forks source link

Automatic type resolver extraction from dataclasses #55

Open Kamforka opened 1 year ago

Kamforka commented 1 year ago

I was wondering if the project could benefit from an automatic type resolver extraction feature.

I have an example implementation that I created for my own use-case, but I found it quite generic, and I believe it might make sense to add it to the core library.

The type resolver implementation looks like this:

import datetime as dt
import types
import typing
from decimal import Decimal

import rule_engine

PYTYPE_TO_ENGINETYPE = {
    list: rule_engine.DataType.ARRAY,
    tuple: rule_engine.DataType.ARRAY,
    dt.datetime: rule_engine.DataType.DATETIME,
    dt.date: rule_engine.DataType.DATETIME,
    int: rule_engine.DataType.FLOAT,
    float: rule_engine.DataType.FLOAT,
    Decimal: rule_engine.DataType.FLOAT,
    types.NoneType: rule_engine.DataType.NULL,
    set: rule_engine.DataType.SET,
    str: rule_engine.DataType.STRING,
    dict: rule_engine.DataType.MAPPING,
    None: rule_engine.DataType.UNDEFINED,
}

def parse_compound_array_enginetype(type_alias):
    main_pytype = type_alias.__origin__
    if hasattr(type_alias, "__args__") and main_pytype is not tuple:
        sub_pytype = type_alias.__args__[0]
    else:
        sub_pytype = None

    main_enginetype = PYTYPE_TO_ENGINETYPE[main_pytype]
    if sub_pytype not in PYTYPE_TO_ENGINETYPE:
        sub_enginetype = parse_compound_enginetype(sub_pytype)
    else:
        sub_enginetype = PYTYPE_TO_ENGINETYPE[sub_pytype]

    return main_enginetype(sub_enginetype)

def parse_compound_mapping_enginetype(type_alias):
    main_pytype = type_alias.__origin__
    if hasattr(type_alias, "__args__"):
        key_pytype = type_alias.__args__[0]
        value_pytype = type_alias.__args__[1]
    else:
        key_pytype = None
        value_pytype = None

    main_enginetype = PYTYPE_TO_ENGINETYPE[main_pytype]

    if key_pytype in PYTYPE_TO_ENGINETYPE:
        key_enginetype = PYTYPE_TO_ENGINETYPE[key_pytype]
    else:
        key_enginetype = parse_compound_enginetype(key_pytype)

    if value_pytype in PYTYPE_TO_ENGINETYPE:
        value_enginetype = PYTYPE_TO_ENGINETYPE[value_pytype]
    else:
        value_enginetype = parse_compound_enginetype(value_pytype)

    return main_enginetype(key_enginetype, value_enginetype)

def parse_compound_enginetype(type_alias):
    if hasattr(type_alias, "__origin__"):
        if type_alias.__origin__ in (list, tuple, set):
            return parse_compound_array_enginetype(type_alias)
        if type_alias.__origin__ is dict:
            return parse_compound_mapping_enginetype(type_alias)
    if isinstance(type_alias, typing._LiteralGenericAlias):
        python_type = type(type_alias.__args__[0])
        return PYTYPE_TO_ENGINETYPE[python_type]

    return rule_engine.DataType.UNDEFINED

def type_resolver_from_dataclass(cls: type):
    if not dataclasses.is_dataclass(cls):
        raise Exception

    type_resolver = {}
    for fieldname, fieldtype in cls.__annotations__.items():
        if fieldtype in PYTYPE_TO_ENGINETYPE:
            type_resolver[fieldname] = PYTYPE_TO_ENGINETYPE[fieldtype]
        else:
            compound_type = parse_compound_enginetype(fieldtype)
            type_resolver[fieldname] = compound_type
    return type_resolver

And to test it one can do:

import dataclasses
import datetime as dt
import typing

ChoiceText = typing.Literal["one", "two", "three"]
Order = typing.Literal[1, 2, 3]

@dataclasses.dataclass
class Model:
    id: int
    title: str
    tags: list[str]
    shares: typing.Dict[str, typing.List]
    index: typing.Dict
    uniques: set[float]
    created: dt.datetime
    undefined: str | int | float
    customers: typing.List[Decimal]
    singles: tuple[int]
    pairs: tuple[str, int]
    choice: ChoiceText
    orders: list[Order]

type_resolver = type_resolver_from_dataclass(Model)

It should produce a type resolver like the below:

{
 'choice': <_DataTypeDef name=STRING python_type=str >,
 'created': <_DataTypeDef name=DATETIME python_type=datetime >,
 'customers': <_ArrayDataTypeDef name=ARRAY python_type=tuple value_type=FLOAT >,
 'id': <_DataTypeDef name=FLOAT python_type=Decimal >,
 'index': <_MappingDataTypeDef name=MAPPING python_type=dict key_type=UNDEFINED value_type=UNDEFINED >,
 'orders': <_ArrayDataTypeDef name=ARRAY python_type=tuple value_type=FLOAT >,
 'pairs': <_ArrayDataTypeDef name=ARRAY python_type=tuple value_type=UNDEFINED >,
 'shares': <_MappingDataTypeDef name=MAPPING python_type=dict key_type=STRING value_type=ARRAY >,
 'singles': <_ArrayDataTypeDef name=ARRAY python_type=tuple value_type=UNDEFINED >,
 'tags': <_ArrayDataTypeDef name=ARRAY python_type=tuple value_type=STRING >,
 'title': <_DataTypeDef name=STRING python_type=str >,
 'undefined': <_DataTypeDef name=UNDEFINED python_type=UNDEFINED >,
 'uniques': <_SetDataTypeDef name=SET python_type=set value_type=FLOAT >
}

Please tell me if you think it makes sense to add it to the lib and I can work it out more.

zeroSteiner commented 1 year ago

Yeah that sounds useful for sure. A couple things come to mind though.

  1. I guess data classes weren't added until Python 3.7. Right now rule-engine supports Python as far back as 3.4 though I do have plans to drop 3.4 and 3.5 relatively soon. I kept them around because the original project that motivated me to write rule-engine doesn't work on newer Python versions for reasons I won't get into but I think the time has come to drop those older versions because it's been making CI maintenance more difficult. Having said that I think the solution just needs to not break everything on older versions of Python. I wouldn't expect the functionality to be present in that case, so simply guarding it with some sort of check (probably on the version) would be sufficient.
  2. There seems to be a fair bit of overlap with functionality provided by DataClass.from_type and DataClass.from_value. I'd really like to keep type of type mapping logic consolidated for maintenance in the future.
  3. Lastly the type resolver should be able to be a drop in replacement so if it's not already callable, it should be callable. Looks like you could take your dictionary and just forward that onto type_resolver_from_dict though.

This is cool stuff for sure, thanks for sharing.

Kamforka commented 1 year ago

Thanks for the feedback @zeroSteiner , glad you find it useful!

Let me just reply to your points one by one:

  1. That's right dataclasses are not backward compatible with older versions. Actually I've just checked and even 3.7 is almost end of life (27 Jun 2023), so this might be the chance to add this feature once you get rid of older version support.
  2. To be honest I haven't checked the already existing internal parsers so that is a good suggestion, I'm gonna take a look at them and reuse the best I can.
  3. I agree and don't see any issues with that, either options are easy to implement.

So what do you think, should I follow up with a PR?

Kamforka commented 1 year ago

Created a sample implementation in #56 feel free to check it out, I ran the tests locally and they seem to work fine, also added a guard to tests the modern type hinting syntax (list[str], dict[str, str], etc.) only from python v3.9 and above.

zeroSteiner commented 3 months ago

Closing this since I'm pretty sure it was completed when I merged #56 and I just forgot about this ticket.

akr31 commented 1 month ago

Thanks @zeroSteiner for this amazing library, I am exploring it for a particular usecase.

The type hinting feature is really important for my usecase and I want to detect the issues at compile time. My input is a python dataclass. Is it possible to build a general resolver like type_resolver_from_dict which can work on the dataclass recursively, at compile time. .. @Kamforka do you have any idea here?

zeroSteiner commented 1 month ago

I think it should work based on my notes in this PR. If you have an example of what you're trying to do that's failing, I can look into it.

akr31 commented 1 month ago
class Type(str, Enum):
    A = "A"

@dataclass
class Name:
    name: str
    dct: dict[str, str]
    typ: Type = Type.A

@dataclass
class User:
    name: Name
    friends: list[Name]
    email: str

context = rule_engine.Context(
    resolver=rule_engine.resolve_attribute,
)

rule = rule_engine.Rule('friends.bb == "Vader"', context=context)
# Want this to generate SymbolResolutionError
# but its passing

I wold want to do something like

context = rule_engine.Context(
    resolver=rule_engine.resolve_attribute,
    type_resolver= type_resolver_for_dataclass(User)
)

rule = rule_engine.Rule('friends.bb == "Vader"', context=context)
zeroSteiner commented 1 month ago

@akr31 I think the larger issue is that there isn't a way to define an object with attributes in Rule Engine. I've been meaning to add it recently because I've had my own use case that's very similar to what you're describing. After looking into this a bit more I don't think I ever fully added type resolvers for dataclasses. It looks like the PR I referenced only added support to DataType.from_type to accept type annotations (e.g. typing.Dict[str, str]) and not full-on dataclass instances.

I'd like to get this working but it might take me a while. I've reopened the ticket so I don't forget. I need to finish my current goal of adding a proper BYTES datatype. After that, I'll look at OBJECT which will be a pre-requisite for this work.