zeroSteiner / rule-engine

A lightweight, optionally typed expression language with a custom grammar for matching arbitrary Python objects.
https://zerosteiner.github.io/rule-engine/
BSD 3-Clause "New" or "Revised" License
445 stars 54 forks source link

Decimal support #12

Closed iuvbio closed 3 years ago

iuvbio commented 3 years ago

Would it be possible to add support for decimal.Decimal DataTypes?

I'm happy to make a PR as well, if you agree.

zeroSteiner commented 3 years ago

It's hard to say without having a better understanding of what the use case would be. Could you provide an example? At first glance I'm not sure when it would be helpful to have a configurable precision setting.

iuvbio commented 3 years ago

Of course, I mean that when I have a python object that is a Decimal, I cannot currently apply any arithmetic operations or comparisons to it in a rule.

thing = {"isin": "us22289894235", "price": Decimal("21.3329")}
rule_engine.Rule("price > 0").matches(thing)

This raises EvaluationError: data type mismatch, which makes sense, since rule_engine doesn't know about decimal. I also cannot use a custom type_resolver for that, as that would have to return a valid rule_engine.DataType as well. Unless I'm missing something here.

My thought was that, ideally, all arithmetic operations could be applied to decimal objects, which could work by converting the RHS prior to the operation. Additionally, string operations could be useful as well to check the decimal up to a certain precision, for example Rule("price =~ '21.33'") for the above case. Although that is secondary, I'm mostly interested in arithmetic operations.

zeroSteiner commented 3 years ago

Thanks that makes a lot more sense. Let me take a closer look now that I know what you mean. I might be able to add that in pretty easily. I think the trick will be maintaining the precision.

zeroSteiner commented 3 years ago

I've looked into this and I think it's a good idea and will implement it.

Bit of background from my notes for posterity though:

I tried to have side by side support for floats and Decimals, but that seemed to introduce some problems that are counterintuitive. The idea was to update the AST operations to do the right thing when the input was a Decimal or a float, treating them both as FLOAT internally. Basically all the arithmetic operations are easy, things start to get more complicated when it comes to bitwise operations and compound data types. For bitwise, I can't think of a clear way to determine what the resulting data type should be, ie. always Decimal, always float or only Decimal if one of the inputs was a Decimal. The other larger issue is compound datatypes. From Python's perspective {0.1: 'float', Decimal('0.1'): 'decimal'} has two unique keys making the resulting dataype important.

Since that didn't go well I think the internal implementation should consistently use either Python floats or Decimals for the Rule Engine FLOAT datatype. Now because Python's float datatypes don't always behave in a way that humans would expect them to and the intention of the Rule Engine project is to allow non-Programmers to write expressions in an intuitive way, I think it makes the most sense to utilize the Decimal datatype internally for everything. Returning to the previous example, Rule Engine would then treat a value from Python of {0.1: 'float', Decimal('0.1'): 'decimal'} as having 1 unique key, whose value is undefined. This would effectively be the same as if you were to define a dict in Python as {1: 'one1', 1: 'one2'}.

If anyone was relying on the classic floating point behavior, this would be a breaking change. This seems unlikely but I've decided to bump the major version just in case anyways.

This will be the breaking change in version 3.0.

iuvbio commented 3 years ago

Thanks a lot for the update and for explaining your thought process. I do agree that for this project it makes more sense to just use Decimal for everything.

I don't quit understand why you would have to treat both float and Decimal as FLOAT internally though, if you decided to support the two. Couldn't you add an internal DataType DECIMAL and let the user decide what a value should be interpreted as with a custom type resolver? In case no type resolver is provided, you could default to treating the value as either FLOAT or DECIMAL, probably the latter for the reasons you mentioned above, although if you decided for the former, you'd maintain backward compatibility. For me personally it doesn't matter, I'm just curious why you think that would not be possible.

Also, how did you decide to handle precision? Will the user be able to pass a decimal.Context object to the rule's context? I don't actually think it matters in many cases, I can only think of some edge cases where the rounding or prec parameter could have an impact.

zeroSteiner commented 3 years ago

I don't quit understand why you would have to treat both float and Decimal as FLOAT internally though, if you decided to support the two

Because adding support for new data types is alot of work. The syntax needs to be defined, the parser then updated, and a whole bunch of AST nodes need to be edited to add support for each operation that should be supported by DECIMAL (which basically be identical to FLOAT). After all of that, unit tests need to be written to ensure consistent behavior in the future and finally, the documentation needs to be written. All of that doesn't seem worth it when I think the difference between the two types would 1) confuse rule writers and 2) facilitate an unlikely edge case involving a mixed usage of float and decimal.Decimal data types. It's definitely possible, I just don't want to do it.

Also, how did you decide to handle precision? Will the user be able to pass a decimal.Context object to the rule's context?

Yes, exactly this. There's a new decimal_context keyword argument that can explicitly define one otherwise it will default to the context of the current thread. Calls to evaluate the rule will then use this context regardless of the calling thread's context. This should make the rule evaluate consistently based on the decimal context of the thread which created the rule context and not the decimal context of the thread which evaluates the rule.

iuvbio commented 3 years ago

Makes sense! I see 3.0.0 was pushed 6 minutes ago, I look forward to trying it out.

zeroSteiner commented 3 years ago

v3.0 has been released and posted to PyPi. It contains the changes that we have discussed here. It also adds the to_str attribute for the FLOAT data type so you can apply a regex to the value as you suggested.