zeroSteiner / rule-engine

A lightweight, optionally typed expression language with a custom grammar for matching arbitrary Python objects.
https://zerosteiner.github.io/rule-engine/
BSD 3-Clause "New" or "Revised" License
455 stars 54 forks source link

Complex object with dicts and arrays #38

Open devBioS opened 2 years ago

devBioS commented 2 years ago

Came about this great library while searching for python rule engines, this project would fit my needs completly if I could traverse arrays with the dot-syntax (like issue #30 but more deep and nested). This is because this kind of syntax would be easy enough to give other people the chance to write some rules for specific actions without having to know python.

I have dicts with array of dicts that have varying content and I need to evaluate if a specific path is present, ignoring the positin in the arrays:

{"event": { "title": "1 computer made a problem", "startDate": "20220502", "endDate": "20220502", "created": 1651528631972, "creatorId": None, "internaldata": [ { "type": "USER", "details": { "firstName": "first1", "lastName": "last2" } }, { "type": "COMPUTER", "details": { "fqdn": "computer1.domain.net", "lansite": "Munich" } }
], "items": [ { "type": "EVENT", "computerinfo": { "resources": [ {"userassigments": "data1"}, {"companyassigned": "Yes"}, {"otherdata": "data2"} ] } ] }

I could do that with your library now: Rule('event.title =~ ".*made a problem$" and event.items[0].computerinfo.resources[3].companyassigned == "Yes"')

Because the data is not always the same and the position of the dicts within the arrays change, I would need to somehow traverse the arrays within the dicts to check if specific data is present (dict keys are always named the same), e.g.:

Rule('event.title =~ ".*made a problem$" and event.items[*].computerinfo.resources[*].companyassigned == "Yes"')

Is that possible somehow or could be added to the library?

zeroSteiner commented 2 years ago

Yeah you'd want to use a rule like this: event.title and [item for item in event.items if item&['computerinfo'] and [resource for resource in item['computerinfo']['resources'] if resource&['companyassigned'] == 'Yes']]

That leverages nested comprehension along with the safe navigation operator to avoid key lookup errors. Arrays that are not empty eval to True just like they do in Python. Alternatively, you could be more explicit by checking the length of the array by using the length or is_empty attribute.

That would work for your use-case as is. FWIW though I'd recommend doing some normalization on the data to make it easier to write rules. Specifially if items.computerinfo.resources was a dictionary instead of an array because the keys were unique, it'd be easier to write rules.

Also you can use the debug_repl module to experiment with this.

(rule-engine)   : rule-engine: 11:02:03 rule-engine cat repl_setup.py
# issue #38
thing = {
    "event": {
        "title": "1 computer made a problem",
        "startDate": "20220502",
        "endDate": "20220502",
        "created": 1651528631972,
        "creatorId": None,
        "internaldata": [
            { "type": "USER", "details": { "firstName": "first1", "lastName": "last2" } },
            { "type": "COMPUTER", "details": { "fqdn": "computer1.domain.net", "lansite": "Munich" } }
        ],
        "items": [
            {
                "type": "EVENT",
                "computerinfo": {
                    "resources": [ {"userassigments": "data1"}, {"companyassigned": "Yes"}, {"otherdata": "data2"} ]
                }
            }
        ]
    }
}
(rule-engine)   : rule-engine: 11:02:05 rule-engine PYTHONPATH=$(pwd)/lib python -m rule_engine.debug_repl --edit-file repl_setup.py --debug
executing: repl_setup.py
rule > event.title and [item for item in event.items if item&['computerinfo'] and [resource for resource in item['computerinfo']['resources'] if resource&['companyassigned'] == 'Yes']]
result: 
True
rule >
devBioS commented 2 years ago

Thanks a lot for the explanation! I did some tests and it looks like it work for this case, I didn't get it that far :)

I just cannot get the people who should write such rules to understand the syntax like this :D

I already tried to normalize the data ahead of this issue to get rid of the arrays but in most cases I have the same keys with only little difference in values that would overwrite itself during normalization.

In my real scenario I have about 8-10 levels deep dicts of array of dicts - I think writing rules for this would be too complex for my users. It would be easier and better readable if they just could set some placeholder like * or # into the brackets to say "any" for arrays.

Nevertheless thank you very much!

zeroSteiner commented 2 years ago

Let me think about it. I'll admit I like the syntax you're proposing. I think I could make it backwards compatible and relatively intuitive if I used # instead of *.

devBioS commented 2 years ago

That would be the burner and it would make this library the only one I'm aware of that can traverse arrays and evaluate later dicts.

If a user could create a rule like this:

Rule('event.title =~ ".*made a problem$" and event.items[#].computerinfo.resources[#].companyassigned == "Yes" and event.internaldata[#].details.lansite == "Munich" ')

That would be very intuitive, easy to read and some non-programmers could use it easily as it is leaned towards directory traversal syntax like

ls /mnt/*/asdf/*/test

For arrays that have no such keys you could still apply the context default variable, so if it is set to None it will ignore dicts following an array that don't have the requested keys. If the context is not set it will produce an exception where some can react.

Would be somewhat cool to see this in your library, I tried myself to find some startingpoints with this, but my debugging environment seems not work correctly while the ast tree is built and operators are selected, maybe some kind of threading problem that won't give me the full callstack..

Anyway, if I could help let me know :D

vatsramkesh commented 1 year ago

@zeroSteiner This library is impressive thanks for maintaining it Does it support builtin math func called on iterable i.e:

 data = {"a": {"b": [{"count": 19}, {"count": 18}]}}
r = rule_engine.Rule('sum([v.count for v in a.b])' == 37)
zeroSteiner commented 1 year ago

@vatsramkesh No, see #58 which is a duplicate of #32.

xarses commented 1 year ago

Came here also looking for array[*].

In my case, I'm looking to work some kubernetes and other objects. In a contrived example, I'd like to know if any of the elements in the metadata.owner_references list have a member kind == 'ReplicaSet'

{'metadata': {
    'owner_references': [{'api_version': 'apps/v1',
                       'block_owner_deletion': True,
                       'controller': True,
                       'kind': 'ReplicaSet',
                       'name': 'argo-rollouts'}],
}}

In this case the kubernetes API returns objects, so I've loaded the attribute_resolver, works quite well.

>>> ctx = rule_engine.Context(resolver=rule_engine.resolve_attribute)
>>> r = Rule("metadata.owner_references[0].kind == 'ReplicaSet'", context = ctx)
>>> r.matches(p)
True

However there may be no, or multiple items in this array, so tracing them all would valuable. Ideally something like

r = Rule("metadata.owner_references[*].kind == 'ReplicaSet'", context = ctx)

In rego, there is a whole thing with it supporting walking all members of any enum, and would result in a similar rule

metadata.owner_references[_].kind = 'ReplicaSet'
ewertonsantanams commented 4 days ago

Hello everyone, thinking about this same case, I put together something that builds list comprehension in the framework's syntax. It doesn’t handle cases where there are nested rules searching for a more “internal” value of an array within another array, but I believe it can still help with more straightforward cases, such as creating a rule for a value within an array, just as devBioS proposed. It’s simple, but it’s an idea of how to construct it. Here’s the snippet:

def generate_list_comprehension(path_condition):
   match = re.match(r'^(.*?)\s*(==|!=|<=|>=|<|>)\s*(.*)$', path_condition)

    if match:
        path = match.group(1).strip()  # First group is the path
        operator = match.group(2).strip()  # Second group is the operator
        condition = match.group(3).strip()  # Third group is the condition

    path = path.strip()  # Remove any leading/trailing whitespace from the path
    condition = condition.strip()  # Remove any leading/trailing whitespace from the condition

    # Split the path into parts using the "[]" marker
    path_parts = list(map(lambda s: s.replace('.', '') if s.startswith('.') else s, path.split("[]")))

    # Extract the condition key and value (condition can be in quotes or not)
    condition_key = path_parts[-1].strip()  # The last part before '==', e.g., 'GroupId'
    condition_value = condition.strip().strip('"').strip("'")  # Remove any surrounding quotes

    # Start constructing the comprehension
    var_stack = []  # To track variable names (e.g., reservation, instance, sg)
    comprehension = ""

    for idx, part in enumerate(path_parts[:-1]):  # Process everything except the last part (key like 'GroupId')
        var_name = f"var{idx}"  # Generate generic variable names like var0, var1, var2...
        var_stack.append(var_name)

        if idx == 0:
            # The first part (e.g., Reservations) starts the outer comprehension
            comprehension += f"[{var_name} for {var_name} in {part}"
        else:
            # Add the nested comprehensions (e.g., Instances, SecurityGroups)
            comprehension += f" if {var_stack[idx-1]}&[\"{part}\"] and [{var_name} for {var_name} in {var_stack[idx-1]}[\"{part}\"]"

    # Add the condition check (e.g., GroupId == 'sg-903004f8')
    comprehension += f" if {var_stack[-1]}&[\"{condition_key}\"] {operator} \"{condition_value}\"]" + "]" * len(path_parts[:-2])

    return comprehension

# Example usage
path_condition = "Reservations[].Instances[].VpcId == 'vpc-1a2b3c4d'"

# Generate the list comprehension as a string
list_comp_str = generate_list_comprehension(path_condition)

# Print the generated comprehension
print(list_comp_str)

#The outuput
#[var0 for var0 in Reservations if var0&["Instances"] and [var1 for var1 in var0["Instances"] if var1&["VpcId"] == "vpc-1a2b3c4d"]]