letscontrolit / ESPEasy

Easy MultiSensor device based on ESP8266/ESP32
http://www.espeasy.com
Other
3.26k stars 2.21k forks source link

Rules code #3164

Open guestisp opened 4 years ago

guestisp commented 4 years ago

Simple question: for a small project, i was developing something very similiar to espeasy rules.

As reinventing the wheel is not smart, any hint on which parts should i backport without dismounting the whole firmware?

TD-er commented 4 years ago

Depends on what you need.

The code is sadly not that well split as I would like it to be, so it is not like "just use the Rules.ino" and you're done.

guestisp commented 4 years ago

Thats why i've asked Im looking to the code parsing the rules evaluating the conditions and taking actions

Conditions and actions would be different, so i don't need that code

Something like being able to parse:

IF sensor1 = A OR sensor2 = B THEN serial.println() ENDIF

TD-er commented 4 years ago

Start looking at this function:

String rulesProcessingFile(const String& fileName, String& event)

As you can see, it uses a number of flags to keep track of what it's doing. (e.g. branch depth, whether it is a command, etc.)

The rules we use are mainly parsed per event trigger. Something like a variable update, a timer, etc.

So first we try to find the matching event and if found, enter the do ... endon scope of the call. In there you can have several if ... else ... endif parts.

See parseCompleteNonCommentLine and processMatchedRule.

N.B. while reading a line of the rule, you may encounter variables which may need to be replaced. For example %systime% or [bme#temp] must be replaced with their current values. That's also being done in there.

I think that should be enough to get you started.

guestisp commented 4 years ago

Thank you. I'll look deeply at it tomorrow at work

happytm commented 4 years ago

@guestisp I am too interested in light weight simple rules engine. Please keep us posted.

Thanks.

CurlyMoo commented 3 years ago

I'm developing a rule systems myself for the ESP. Maybe it's something that can work for ESPEasy as well, since there effort is being made to improve the current one. I started this so an ESP addon board for the Panasonic Heatpumps can be made smart with custom rules.

My current code consists of:

This example rule (with doesn't consists of functions yet):

if (1 == 1 && 1 == 0) || 5 >= 4 then
    $a = 1;
    if 6 == 5 then
        $a = 2;
    end
    $a = $a + 3;
    $b = (3 + $a * 5 + 3 * 1) * 2;
    @c = 5;
elseif 2 == 2 then
    $a = 6;
else
    $a = 7;
end

In this case the @c is meant to set the Heatpump parameters.

The function code does exists, but hasn't made it through to my non-recursive version yet. An example:

$b = (3 + max($a * 5, 10) + 3 * min(5, random(0, 10 * $c))) * 2;

See below for the output after the rule has been parsed.

On the ESP the performance is:

rule #0 was prepared in 3603 microseconds
rule #0 was parsed in 3276 microseconds
rule #0 was executed in 2924 microseconds

Is this what you'd expect from a rule system? As in, is this performant enough for ESPEasy compared to the current rules system? If there is any interest, i will do my best to make it into an easy to use module. It would be great if it can be used in ESPEasy, because it would make my current rules a lot simpler 😄


afbeelding

TD-er commented 3 years ago

I would love to see how you did it. Do you also have a grammar of the things you can parse?

If it can be used to improve the current rules parser and also extend it, then that would be great. I'm a bit hesitant to have 2 rules syntaxes, as that's really hard to maintain and takes a lot of resources. (not only in the compiled bin) So if we can get to use the best of both worlds, then I guess it will be a win-win.

Unlimited nesting does raise some red flags regarding timing, but maybe I'm a bit too pedantic about timing... You made me very curious :)

CurlyMoo commented 3 years ago

Do you also have a grammar of the things you can parse?

No, not in the way you expect. The example i gave is a good example. It generally just follows common programming syntax as we expect from C / PHP.

I'm a bit hesitant to have 2 rules syntaxes, as that's really hard to maintain and takes a lot of resources. (not only in the compiled bin)

True, but the current experimental version don't suffice currently, so if my version does, only one switch is necessary. I also don't think the differences will be that big like for functions, e.g.:

Unlimited nesting does raise some red flags regarding timing, but maybe I'm a bit too pedantic about timing...

Can you elaborate this?

You made me very curious :)

Nice, although you haven't yet answered by question. What performance would make a new parser acceptable?

TD-er commented 3 years ago

taskValueSet 1,1,0 => taskValueSet(1,1,0);

If we can convert a line like this, we can support both, so that should not be a problem.

Unlimited nesting does raise some red flags regarding timing, but maybe I'm a bit too pedantic about timing... Can you elaborate this?

Rules act on events, so the processing of a single event should not take too long to process or else it may block other things the ESP should do. For example if rules handling takes 1 second, you will miss 50 scheduled "FIFTY_PER_SECOND" calls and thus a number of tasks may perform incorrect measurement data. You may also miss incoming data, etc.

When you support an infinite depth of if statements etc, you may encourage users to try and perform it all in a single event and thus have rules parsing that may take a long time and thus affecting performance of all other parts in ESPEasy

CurlyMoo commented 3 years ago

If we can convert a line like this, we can support both, so that should not be a problem.

By supporting both you mean some adapter logic that convert taskValueSet(1, 1, 0) back to taskValueSet 1,1,0?

Rules act on events, so the processing of a single event should not take too long to process or else it may block other things the ESP should do.

Of course

When you support an infinite depth of if statements etc, you may encourage users to try and perform it all in a single event and thus have rules parsing that may take a long time and thus affecting performance of all other parts in ESPEasy

True, but is that the responsibility of ESPEasy of that of the user?

For example if rules handling takes 1 second

That's why i asked for some metrics. Do you have rules that you unittest and know the performance of so i can reference to that. I think the current rules parsed quite fast, but then again, i don't have any reference.

TD-er commented 3 years ago

By supporting both you mean some adapter logic that convert taskValueSet(1, 1, 0) back to taskValueSet 1,1,0?

The other way around.

True, but is that the responsibility of ESPEasy of that of the user?

Experience has tought me that when you allow it (or not limit it), it will be used. So there is a fine (almost filosophical) line determining who's responsibility it is. In order to keep it "easy" and simple, the firmware should make clear when it will cause issues.

That's why i asked for some metrics. Do you have rules that you unittest and know the performance of so i can reference to that. I think the current rules parsed quite fast, but then again, i don't have any reference.

Let's try to keep every blocking call less than 100 msec. See timing stats page. All highlighted (and bold) items took > 100 msec

CurlyMoo commented 3 years ago

Just to give you an impression. Due to development easyness i used a struct based kinda stack tree in the previous version. However, that took too much memory. The low memory footprint was still under development 😜

However, the new version now replaces the structs with a simple bytecode representation. This is how that looks.

if 12 == 1 then $a = 1 + 2 + 3; end

9 4 12. 1 0 4 1. 11 13 $a. 14 4 1. 1 1 4 2. 1 1 4 3. 15 12 . . 18 35 9 33 41 . 4 41 1 1 35 5 38 47 50 4 41 7 16 41 53 13 50 11 69 4 60 16 1 53 19 57 66 . . 4 60 21 1 53 24 60 75 . . 4 69 26

Let me tell you the dots are actually \0. The double dots are the end of the actual rule but then represented as small as possible. The part from the two dots to the end contains the route to parse the rule. That can be easily represented in a tree as i posted before. I won't tell too much about the decoding of the bytecode, because that would ruin the puzzle 😉

Still needs some improvements though, because just using one byte per step only allows 255 positions, and this simple rule already uses position 75.

In this case, it only takes 78 bytes to prepare the rule for (hopefully) fast enough execution.

TD-er commented 3 years ago

Hmm I'm not clear enough right now to parse this in my head.

9 4 12. looks something like "on true go to 9th position, predicate at 4th position" But to be honest, I find it quite hard to read like this. For debugging, there must be some kind of interpreter to explain these byte streams.

And just curious, you precompile the rules first and then store it like this? How do you account for system variable replacement, or [taskname#varname] like replacements?

Edit: Ah no, now I see it (just got myself a beer, helps apparently)

12. is a floating point value 12.0. The same for 1. etc. (every value with a .)

9 4 = if ??? Maybe 4 is if, 9 is position? 1 0 4 = == 11 13 = then $a. Variable a, but I don't understand the . 14 4 = = 1 1 4 = +

CurlyMoo commented 3 years ago

Ah no, now I see it (just got myself a beer, helps apparently)

Almost. You forgot to read that i said the dots are actually \0.

9 = IF 4 = NUMBER NUMBER + 1 until \0 is the value so 12 1 = OPERATOR OPERATOR + 1 is operator position in operator list (in this case ==) 4 = NUMBER NUMBER + 1 until \0 is the value so 1 11 = THEN 13 = VAR VAR + 1 until \0 is the value so $a 14 = Assignment = 4 = NUMBER NUMBER + 1 until \0 is the value so 1 1 = OPERATOR OPERATOR + 1 is operator position in operator list (in this case +) 4 = NUMBER NUMBER + 1 until \0 is the value so 2 1 = OPERATOR OPERATOR + 1 is operator position in operator list (in this case +) 4 = NUMBER NUMBER + 1 until \0 is the value so 3 15 = SEMICOLON 12 = END .. = END OF RULE

18 = START / 35 = GOTO position 35 9 = IF / RET to first operator 33 / GOTO position 41 4 = INTEGER / RET to operator 41 / GOTO value position 1 1 = OPERATOR / RET to if position 35 / GOTO operator value position 5 / GOTO left value position 38 / GOTO right value position 47 / GOTO true position 50 4 = INTEGER / RET to operator 41 / GOTO value position 7 16 = TRUE / RET to operator 41 / GOTO first operation position 53 13 = VAR / RET to true 16 / GOTO value position 11 / GOTO first operator 69 4 = INTEGER / RET to operator 60 / GOTO value position 16 1 = OPERATOR / RET to if position 53 / GOTO operator value position 19 / GOTO left value position 57/ GOTO right value position 66 4 = INTEGER / RET to operator 60 / GOTO value position 21 1 = OPERATOR / RET to if position 53 / GOTO operator value position 24 / GOTO left value position 60 / GOTO right value position 75 4 = INTEGER / RET to operator 69 / GOTO value position 26

The operator precedence logic wasn't implemented in this bytecode version yet so those routes are not fully correct.

The positions are actual positions in the bytecode. So when we jump to position 1 in the bytecode we see 4 12. That means we can simply parse &rule->bytecode[1+1] which returns 12. The value of an number can always be found on the next position of the number position. Because all hardcoded values are 0 terminated we don't have to know how many characters those values are.

So, as soon as the interpreter runs, it just has to follow the different positions through the bytecode, with minimal to no additiona lookups.

And just curious, you precompile the rules first and then store it like this?

Yes, because that's a lot faster.

How do you account for system variable replacement, or [taskname#varname] like replacements?

Depends where you store those values. You can define a SYSVAR type (e.g. 20). Then have a list of SYSVAR functions, e.g. 1) getSysTime 2) getIPAddress 3) getSSID1 etc. You can parse the rule like this: 20 1 So %systime% just translates to the first function in the sysvar function list. That immediatly tells you what function to run at this specific rule position.

The same counts for the taskname#varname. If those are stored in a predefined list, you can make pointers to those indexes in the list. If not, you have to store the actual names and then have a lookup function in the interpreter.

But to be honest, I find it quite hard to read like this. For debugging, there must be some kind of interpreter to explain these byte streams.

True, that's why i output two things for debugging. The actual rule, the bytecode and the bytecode translated to dot:

"0x564dde413ae1"[label="33" shape=square]
"0x564dde413ae1-1"[label="START"]
"0x564dde413ae2"[label="35" shape=square]
"0x564dde413ae1" -> "0x564dde413ae1-1"
"0x564dde413ae1-1" -> "0x564dde413ae2"
"0x564dde413ae3"[label="35" shape=square]
"0x564dde413ae3-1"[label="IF"]
"0x564dde413ae3-1" -> "0x564dde413ae5"
"0x564dde413ae3-1" -> "0x564dde413ae4"
"0x564dde413ae4"[label="33" shape=diamond]
"0x564dde413ae5"[label="41" shape=square]
"0x564dde413ae3" -> "0x564dde413ae3-1"
"0x564dde413ae6"[label="38" shape=square]
"0x564dde413ae6-1"[label="12"]
"0x564dde413ae6" -> "0x564dde413ae6-1"
"0x564dde413ae7"[label="41" shape=diamond]
"0x564dde413ae6-1" -> "0x564dde413ae7"
"0x564dde413ae9"[label="41" shape=square]
"0x564dde413aec"[label="38" shape=square]
"0x564dde413aed"[label="47" shape=square]
"0x564dde413aee"[label="50" shape=square]
"0x564dde413ae9-1"[label="=="]
"0x564dde413ae9" -> "0x564dde413ae9-1"
"0x564dde413aea"[label="35" shape=diamond]
"0x564dde413ae9-1" -> "0x564dde413aea"
"0x564dde413ae9-1" -> "0x564dde413aec"
"0x564dde413ae9-1" -> "0x564dde413aed"
"0x564dde413ae9-1" -> "0x564dde413aee"
"0x564dde413aef"[label="47" shape=square]
"0x564dde413aef-1"[label="1"]
"0x564dde413aef" -> "0x564dde413aef-1"
"0x564dde413af0"[label="41" shape=diamond]
"0x564dde413aef-1" -> "0x564dde413af0"
"0x564dde413af2"[label="50" shape=square]
"0x564dde413af2-1"[label="TRUE"]
"0x564dde413af2-1" -> "0x564dde413af4"
"0x564dde413af2-1" -> "0x564dde413af3"
"0x564dde413af3"[label="41" shape=diamond]
"0x564dde413af4"[label="53" shape=square]
"0x564dde413af2" -> "0x564dde413af2-1"
"0x564dde413af5"[label="53" shape=square]
"0x564dde413af5-1"[label="$a"]
"0x564dde413af5-1" -> "0x564dde413af8"
"0x564dde413af5-1" -> "0x564dde413af6"
"0x564dde413af6"[label="50" shape=diamond]
"0x564dde413af8"[label="69" shape=square]
"0x564dde413af5" -> "0x564dde413af5-1"
"0x564dde413af9"[label="57" shape=square]
"0x564dde413af9-1"[label="1"]
"0x564dde413af9" -> "0x564dde413af9-1"
"0x564dde413afa"[label="60" shape=diamond]
"0x564dde413af9-1" -> "0x564dde413afa"
"0x564dde413afc"[label="60" shape=square]
"0x564dde413aff"[label="57" shape=square]
"0x564dde413b00"[label="66" shape=square]
"0x564dde413afc-1"[label="+"]
"0x564dde413afc" -> "0x564dde413afc-1"
"0x564dde413afd"[label="53" shape=diamond]
"0x564dde413afc-1" -> "0x564dde413afd"
"0x564dde413afc-1" -> "0x564dde413aff"
"0x564dde413afc-1" -> "0x564dde413b00"
"0x564dde413b02"[label="66" shape=square]
"0x564dde413b02-1"[label="2"]
"0x564dde413b02" -> "0x564dde413b02-1"
"0x564dde413b03"[label="60" shape=diamond]
"0x564dde413b02-1" -> "0x564dde413b03"
"0x564dde413b05"[label="69" shape=square]
"0x564dde413b08"[label="60" shape=square]
"0x564dde413b09"[label="75" shape=square]
"0x564dde413b05-1"[label="+"]
"0x564dde413b05" -> "0x564dde413b05-1"
"0x564dde413b06"[label="53" shape=diamond]
"0x564dde413b05-1" -> "0x564dde413b06"
"0x564dde413b05-1" -> "0x564dde413b08"
"0x564dde413b05-1" -> "0x564dde413b09"
"0x564dde413b0b"[label="75" shape=square]
"0x564dde413b0b-1"[label="3"]
"0x564dde413b0b" -> "0x564dde413b0b-1"
"0x564dde413b0c"[label="69" shape=diamond]
"0x564dde413b0b-1" -> "0x564dde413b0c"

That output can be posted on http://viz-js.com/ inside the accolades to have a graphical output of the route like i posted before:

afbeelding

TD-er commented 3 years ago

Ah OK, that's clear to me. Do you store the precompiled byte code in memory?

The way I see it now is that you do propose 2 different features/changes:

The first is something I do see a lot of potential for. The second one I'm a bit reluctant to do as it may be confusing to users and hard to maintain. But on the other hand if they both end up in the same pre-compiled byte code, I guess it could be done.

Another big advantage on pre-compiled byte code is that it may help to reduce the build size if you also allow to read the pre-compiled byte code from a file on the file system or maybe even included hard coded in the binary.

Right now all system variables are already being parsed by a single function using enums, so that's already being dealt with. Only thing is, it now returns String as it is done in the preprocessing step of the rules parsing. This does allow for some nice tricks by the way, like [var#%v1%] (the sole reason for having a %-notated variable too).

Like I said, I do really see the potential for pre-compiled rules. And one thing the current rules parsing lacks (as it was a bit difficult to do) is function notation with 2 (or more) parameters, like atan(y,x) So that may be an added benefit. Also the current rules parsing is lacking the ability of doing complex statements as the functions like sin(..) are done in the pre-processor so they can only handle things that are already computed. Thus sin([task#var1]*2) does not yet work.

CurlyMoo commented 3 years ago

Do you store the precompiled byte code in memory?

Yes

Another big advantage on pre-compiled byte code is that it may help to reduce the build size if you also allow to read the pre-compiled byte code from a file on the file system

Storing the bytecode on the filesystem is a benefit indeed, but i don't see what you mean by the first part?

And one thing the current rules parsing lacks (as it was a bit difficult to do) is function notation with 2 (or more) parameters, like atan(y,x)

That's indeed is supported in my code as well as nested functions. Another thing is miss now is being able to do (nested) math in the conditions.

TD-er commented 3 years ago

Storing the bytecode on the filesystem is a benefit indeed, but i don't see what you mean by the first part?

As a build with intent to make it as small as possible, we can maybe strip out most of the rules parsing as we then only need to run the byte code. So you implement the rules on a node with full rules parsing and store the byte code on the file system. Then copy that byte code to the "as small as possible" build and you can run the rules on a number of nodes that are deployed running the same rules.

CurlyMoo commented 3 years ago

Then copy that byte code to the "as small as possible" build and you can run the rules on a number of nodes that are deployed running the same rules.

But you'll lose the editor functionality on that new nodes.

TD-er commented 3 years ago

Then copy that byte code to the "as small as possible" build and you can run the rules on a number of nodes that are deployed running the same rules.

But you'll lose the editor functionality.

Yep, but for builds that really need the size to be as compact as possible, that could be a trade off that may be worth it. What I intent to do (in the future) is allow to let unit A act as the web UI for unit B, so unit B can have a really small build size. For this the units need to be able to transfer settings among each other (or at least accept get/set calls for some settings). I don't see any reason why pre-compiled rules could not be part of this exchange.

CurlyMoo commented 3 years ago

I don't see any reason why pre-compiled rules could not be part of this exchange.

True

CurlyMoo commented 3 years ago

The same rule as https://github.com/letscontrolit/ESPEasy/issues/3164#issuecomment-770883330 but now in a fully working bytecode version (with one pending optimalization).

rule #0 was prepared in 1217 microseconds
rule #0 was parsed in 8728 microseconds
bytecode is 604 bytes
rule #0 was executed in 1323 microseconds
bytecode is 628 bytes

--- print values ---
$a = 4
$b = 52
@c = 5
--- print values ---

rule #0 was executed in 698 microseconds
bytecode is 628 bytes

--- print values ---
$a = 4
$b = 52
@c = 5
--- print values ---

rule #0 was executed in 797 microseconds
bytecode is 628 bytes
TD-er commented 3 years ago

Looks promising and quite a potential in speed improvement.

uzi18 commented 3 years ago

Maybe we can use .bc extension for it?

CurlyMoo commented 3 years ago

@AloisKlingler @tonhuisman @TD-er @uzi18 @guestisp @happytm @Heronimonimo @ehoutsma @IgorYbema Here is the first working version that i found good enough to share in public: https://github.com/CurlyMoo/rules

uzi18 commented 3 years ago

thx will look at it :)

TD-er commented 3 years ago

For use via PlatformIO, you may need to add a library.json and a library.properties file. That way you simply can use the library in a PIO project by mentioning the library in the lib_deps list including a version nr.

TD-er commented 3 years ago

Just a thought.... If we can make rules parsing like this so incredibly fast and perhaps also add some specific functionality like:

Then we can also implement new plugins in this rules language. This allows for unlimited plugins without the need for firmware updates. Just store the plugin rule on the file system and you're done. (or fetch it from a host)

About the function call/event wrapping idea. In ESPEasy we act on events, which can also be considered as a function name with some function parameters. However we currently lack a return value from event handling.

The core of ESPEasy does interact with plugins via calls like PLUGIN_READ and PLUGINS_TEN_PER_SECOND etc. You can also consider those as events with as a parameter the task index (and some more parameters) So why not allow such hooks in this rules parsing?

CurlyMoo commented 3 years ago

For use via PlatformIO, you may need to add a library.json and a library.properties file. That way you simply can use the library in a PIO project by mentioning the library in the lib_deps list including a version nr.

That would be a great first PR 😉

Just a thought....

If we can make rules parsing like this so incredibly fast

Don't over exaggerate thing

Then we can also implement new plugins in this rules language.

That would be a possibility if the functionality doesn't rely on critical timing such as the DS18B20 library or updating displays.

So why not allow such hooks in this rules parsing?

Indeed, why not.

The whole event trigger is something i was thinking about. Should that be handle inside the library or should some additional logic be implemented that calls the rule_interpret function based on incoming events. The latter would keep the rules library more generic and that's how i currently implemented it.

TD-er commented 3 years ago

I haven't looked at your code in great depth as of now, but I was thinking maybe I can add some wrapper to call the rules parsing with some range/location parameters. For example an offset in a file, or maybe you can add some jump condition based on a hash of some event name?

And indeed timing critical plugins should not be done in this rules approach, but to be honest most of the plugins aren't that time critical. And even so, we can take out the timing critical stuff and wrap it in some protocol handler thingy. For example reading from serial may be time critical, but almost all plugins read from serial until some character and then re-schedule themselves to process it. So this does sound like something that can be abstracted into a protocol handler thingy. The same for Dallas 1Wire stuff. The protocol is timing critical, but interpreting the data isn't.

CurlyMoo commented 3 years ago

For example an offset in a file, or maybe you can add some jump condition based on a hash of some event name?

What i did was the following to deal with that 'issue':

    int len = strlen(content);

    while(pos < len) {
        if(strncmp(&content[pos], "if", 2) == 0 && (pos == 0 || (pos > 4 && strncmp(&content[pos-4], "elseif", 5) != 0))) {
            count++;
        }
        if(strncmp(&content[pos], "end", 3) == 0) {
            count--;
        }

        if(count == 0) {
            if(pos + 3 < len) {
                content[pos+3] = '\0';
            }

            rules = REALLOC(rules, sizeof(struct rules_t *)*(nrrules+1));
            rules[nrrules] = MALLOC(sizeof(struct rules_t));
            memset(rules[nrrules], 0, sizeof(struct rules_t));
            rules[nrrules]->text = &content[oldpos];
            rules[nrrules]->nr = 0;

            nrrules++;

            pos += 3;

            while(pos < len && strncmp(&content[pos], "if", 2) != 0) {
                pos++;
            }
            pos -= 1;
            oldpos = pos+1;
        }
        pos++;
    }

Not saying that was the best approach, but again, providing the rule to the rule parser was something the rule library didn't deal with. We can discuss if we want to impement the provision of the rule to the parser into the rule library.

The protocol is timing critical, but interpreting the data isn't.

True

CurlyMoo commented 3 years ago

To encompass these discussions better, i've enable the discussion feature in the rules repository. I would suggest we continue these discussions there in various topics.

About providing rules to the library:

https://github.com/CurlyMoo/rules/discussions/2

tonhuisman commented 1 year ago

~Extracting the rules engine from ESPEasy doesn't seem feasible, so this issue can be closed.~ Let's keep this open, see below ⬇️

CurlyMoo commented 1 year ago

Just for the record. The rules library has been in use by many users of the HeishaMon. An open source Panasonic Heatpump interface. See for example this german topic with many examples: https://www.haustechnikdialog.de/Forum/t/259929/Steuerung-der-Jeisha-mit-Heishamon-und-Rules

tonhuisman commented 1 year ago

Aha, I by mistake classified this issue being obsolete as per the original message/request, but it is a still very useful discussion on an alternative rules processing engine. Let's keep this open 😃