jwadhams / json-logic-js

Build complex rules, serialize them as JSON, and execute them in JavaScript
MIT License
1.27k stars 139 forks source link

REGEX support? #41

Open benneely opened 7 years ago

benneely commented 7 years ago

Curious about if/how JsonLogic might support regex. For the in operator, the following example is given:

{"in":[ "Ringo", ["John", "Paul", "George", "Ringo"] ]}

There may be situations where it is more succinct to represent a match by regex (e.g. matching ICD-9 codes). Working with the same example above, is there a way that the current specification would support regex? For example, something like:

{"regex":[ "\w+(ing)\w+", ["John", "Paul", "George", "Ringo"] ]}

Would you consider altering the specification there isn't a way to currently support?

jwadhams commented 7 years ago

If what you want is a simple way to test a string against a regular expression, you could use something like:

var jsonLogic = require('json-logic-js');

jsonLogic.add_operation("regexp_matches", function(pattern, subject){
    if( typeof pattern === 'string'){
        pattern = new RegExp(pattern);
    }
    return pattern.test(subject);
});

jsonLogic.apply({"regexp_matches": ["\\w+(ing)\\w+", "ingest"]});
jsonLogic.apply({"regexp_matches": ["\\w+(ing)\\w+", "sing"]});
jsonLogic.apply({"regexp_matches": ["\\w+(ing)\\w+", "stings"]});

(Took me longer than I'm proud to admit to figure out why the \w wasn't working.)

If you really need something that tests a regex against an array of options you could either use an array operation like some or write an operation that handles the foreach over the array internally.

benneely commented 7 years ago

Thanks for looking into this and spending some cycles getting it to work! One of the things that appeals to me about jsonlogic is all of the careful thought that went into the way in which rules might be stored in json. I was hoping that json logic was not only a set of engines that know how to read and parse jsonlogic rules (e.g. JS, PHP, python, ruby), but also a specification. So if others want to implement the specification in different languages (Julia, Scala, etc.), it's clear how an engine might parse and use the rules. With the example above, the function add_operation appears to be a function that allows one to create new operators in JS; however if the rule itself were saved and then parsed by a different engine (e.g. python), it should throw an error because the regexp_matches operator doesn't appear to be supported by the online 'spec'.

Do you envision a way two separate out the JsonLogic "spec", so that it might be used across languages? If so, could we keep this example going to see how that might be accomplished? For example, does a question like this become a request to add a new operator to the spec (much like + or - or in), or is there some alternative way?

Thanks in advance!

zhaoyao91 commented 3 years ago

+1 for regex(or someway to check string pattern) and spec.

This is already an old issue, is there any thought for this?

jwadhams commented 3 years ago

OK, couple of thoughts here, and I want to sort of break them apart and address them separately:

if others want to implement the specification in different languages (Julia, Scala, etc.)

There is a shared unit test suite, https://jsonlogic.com/tests.json that I use to make sure the PHP and the JavaScript implementation (that I maintain) both behave basically the same way. In my own project, if I need a new rule that I don't think is a slam dunk fit for the open source library I would implement it in my project in both languages. I have a phone number string formatter that is adequate for my customers in the US and Canada but would feel hugely out of place in this lib, for example.

it's clear how an engine might parse and use the rules

I wish I'd been more aware of this problem when I wrote the spec (such as it is): how many edge cases does the spec need to handle? The one that I'm invested in already is truthiness because it was a huge pain even between just PHP and JavaScript. There are tons of issues even in a single language where the outcome is not "intuitive" but we're honoring the underlying language: https://github.com/jwadhams/json-logic-js/issues/68 and then a different set of issues when porting to a new language with big philosophical differences: https://github.com/jwadhams/json-logic-js/issues/93

It seems reasonable to decide "well JsonLogic says 7 && "potato" === false" -- as a user of the library I wish it did! -- but it would dramatically change the size and complexity of every implementation. If you peek inside the library as it exists today, it is almost universally the lightest possible skin over "whatever the language wants." To give a specific example, in the JavaScript implementation of substr the code has a one-line general case and a five-line special case to port over my favorite PHP feature.

a request to add a new operator to the spec

So regex, on top of everything else, has a sprawling spec. https://en.wikipedia.org/wiki/Regular_expression#Standards Do we want support for Perl-compatible, or BCE, or some tightly tested subset? If we make a subset, people are going to miss features, but if we say "language's choice" then we have the incompatibility illustrated here: https://en.wikipedia.org/wiki/Comparison_of_regular-expression_engines

There's no clear win, so I've been hesitant to bulk up the core functions that ship with the library. I think it would be reasonable to add support for packages, so everyone doesn't have to write their own add_operation code, but I haven't figured out an architecture I like for that. And it still gets back to making sure all the parties that are sharing logic have compatible packages.

Alkarex commented 1 year ago

I came here hoping to find a way to store various sets of regex rules combined by boolean operations... Regarding what specific regex flavour to support, I would suggest to leave that out of the specification and instead rely on a linter or test suite that tests compliance with various languages. Anyone implementing regex support will most likely use the language native library, e.g. preg_match() for PHP, RegExp.test() for JavaScript and so on. And it would be up to the developer of the rules to write regex expressions in a way that is compatible with the needed languages.