newlawrence / Calculate

Math Expressions Parser Engine
MIT License
34 stars 7 forks source link

Boolean expression support #94

Closed Manu343726 closed 5 years ago

Manu343726 commented 5 years ago

Hi,

I would like to know if adding support for parsing and AST generation/evaluation of boolean expressions could be in the scope of this library. My use case is arbitrary boolean expression evaluation, where the evaluation of the expression would work much like a C++ expression template, but done at runtime:

// the model

class Filter
{
    ...
};

Filter operator&&(Filter a, Filter b)
{
    return Filter{[=](const auto& value) {
        return a(value) && b(value);
    }};
}
...

// the parser

namespace calculate {
class FilterParser : public BaseParser<Filter> {
public:
    IntegerParser() : BaseParser<Type>{lexer_from_defaults<Type>()} {
        using namespace defaults;
        operators.insert({
            {"and", {and_<Type>, Precedence::low, Associativity::FULL}},
            {"or", {or_<Type>, Precedence::low, Associativity::LEFT}},
            {"not", {not_<Type>, Precedence::normal, Associativity::FULL}},
            {"xor", {xor_<Type>, Precedence::normal, Associativity::LEFT}},
        });
    }
};
newlawrence commented 5 years ago

Well, I'm afraid that the way the parser is programmed right now is too tied to the philosophy of parsing mathematical expressions so, depending on what you want, it might be possible to achieve it or not using the tools already available.

First of all, my apologies for the unnotice changes on the interface; currently, the only documentation available is a talk of mine (in Spanish) that is a bit dated (for example, lexer_from_defaults is no longer used). I'm planning writing a proper documentation as soon as possible.

Following with your question, the parser is already able to adapt to boolean expressions as long as you provide a custom lexer to allow the serialization and deserialization from string type. The provided default lexer returned from lexer_from_defaults (now, make_lexer) is only suitable to work with numeric data types (both, integral and floating point). Here it is an example that implements a lexer that works with strings, but since it is also dated and doesn't conform the current interface, here it is how you can define a lexer that works with bools:

class BoolLexer final : public calculate::BaseLexer<bool> {
public:
    BoolLexer() :
            BaseLexer<std::string>{
                R"(^(?:true|false)$)",                          // numbers
                R"(^(?!true|false|and|or|xor|not)[^\s(),]+$)",  // function and variable names
                R"(^(?:and|or|xor)$)",                          // operators
                "(", ")", ","                                   // parenthesis and separator
            }
    {}
    BoolLexer(const BoolLexer&) = default;

    std::shared_ptr<BaseLexer<std::string>> clone() const noexcept override {
        return std::make_shared<BoolLexer>(*this);
    }

    bool to_value(const std::string& token) const override {
        return token == "true" ? true : false;
    }

    std::string to_string(const bool& value) const override {
        return value ? "true" : "false";
    }
};

There are some not documented restrictions like capturing groups not being allowed, because the internal mechanics of the lexer will generate and use the next regular expression to tokenize the expression: ((?:true|false))|((?!true|false|and|or|xor|not)[^\s(),]+)|((?:and|or|xor|not))|(\()|(\))|(,)

As you can test, this will tokenize fairly well expressions like: "true and false", but it will also tokenize properly "trueandfalse" instead of considering it a single token. This kind of behavior is desired in mathematical expressions (for example, "1 + 2" or "1+2"). Another not documented restriction is that none of the regexes must match the space character as it is used by the postfix parser ("1 + 2" is "1 2 +" in postfix notation).

To sum up, you could try and provide your own lexer for your Function objects and see if you can come up with some regular expressions that fit your needs, but as you can see, the design of the library is too mathematical oriented.

I'm planning a new version of it to circunvent some of its flaws in the future, like get rid of the tokenizer regular expression, be more careful with the dynamic memory management (as you can guess, this looks and feels like Python too much) and get rid of some remaining newbie errors (noexcept and a make_shared as if bad_alloc errors won't ever ocur...); but I can't assure that it will ever be able to parse any kind of abstract syntax tree.

newlawrence commented 5 years ago

Since I think everything looks like support for this is not planned in the short term, I think I'll close this issue. Maybe I'll reopen it again in future iterations of the library.