kareltucek / firmware

This is fork of UHK's firmware featuring extended macro engine.
Other
82 stars 9 forks source link

Add supports for named register #87

Closed soraxas closed 11 months ago

soraxas commented 2 years ago

Brief

This PR enables using declaring of named registers and use them across all macros, such that rather than using magic numbers as register index across all macros, we can centralise the declaration of registers.

The idea is to store the declaration of named registers within a config file. When we encounter user providing register index as string rather than int, we will then perform a lookup on the named register.

This approach implements a poor man's "hash table" that requires no runtime memory consumption (only uses storage space), and has no extra runtime overhead for existing users (only one extra isNum call for those who uses numbered register).

Usage

This uses a special macro named as $confRegMappings (similar to $onInit, $onKeymap.. which are special). You can declare the named registers as:

declareReg char_counter 12
declareReg os_mode 30
...

For example, I have a macro key that toggles between different OS mode:

ifRegEq os_mode 1 goTo win
linux:
    setReg os_mode 1
    setLedTxt 3000 WIN
    break
win:
    setReg os_mode 0
    setLedTxt 3000 LNX
    break

Then, in other macro I can easily reference the same register to perform different actions based on the register. For example:

ifRegEq os_mode 0 final tapKeySeq C-right right
ifRegEq os_mode 1 final tapKeySeq C-right

Analysis

This adds virtually no overhead for existing numbered register users, and tradeoff runtime performance (scanning through config) for runtime memory consumption (zero memory usage as it is on-demand and we do not store the declaration). The complexity is O(n) where n is the number of lines within $confRegMappings.


Related note

Later on I think it could also be beneficial to add declaration of constants (thought I'm not too sure what should be the syntax). For example, storing constant declaration in $confConstants:

declareConst WINDOWS 1
declareConst LINUX 0

such that I can update the previous example to:

ifRegEq os_mode !LINUX final tapKeySeq C-right right
ifRegEq os_mode !WINDOWS final tapKeySeq C-right
kareltucek commented 2 years ago

PR feedback

Let me give a quick feedback regarding the implemented solution. Please do correct me if I seem to misunderstand anything.

I find it elegant and generally like it. Assuming that we want to use this approach, following things could be improved:

However, it is not ideal, as it is going to cause confusion when (if) autocompletion and syntax highlight gets implemented in Agent.

Counterproposal

Having call-by-name variables is definitely a desirable and well justified cause, so I think we could actually allocate memory for your hash table in RAM. Implementation details are open to discussion. I am thinking of something like:

typedef struct {
    uint16_t offset; //points at some occurence of the identifier in a macro text
    unsigned int len : 5; //length of the token, at most 31
    unsigned int idx : 5; //points into `regs` 
    unsigned int type : 6; //reserved for type of the variable or other information
} register_reference_t

    static int32_t regs[MAX_REG_COUNT];
    static register_reference_t regLookupTable[MAX_REG_COUNT];
    static uint8_t regLookupTableSize; //don't go through entire table if nothing is in there

Variable name would be allocated when encountered for the first time. The lookup table could be sorted and a binary search used for lookup, although I don't find it strictly necessary, as it would add clutter and complexity.

(The token length is a bit of a nuisance - the strings written in config segment are not '\0' trailled.)

Admittedly this is quite costly solution, and although it allows you to use registers to store "constants", you are likely to run out quickly if you use them for such usecase.


What do you think about it? Is the counterproposal worth your time implementing, or should we go with your current implementation for now?

soraxas commented 2 years ago

Thanks for the quick feedback! Appreciate the detailed feedbacks/reply.

You are completely correct in both of your points about my intentions for this implementation.

The above improvements are definitely worth implementing if the current proposal is to be used.

Re: counterproposal

This is definitely an interesting idea and I liked the approach of storing identifiers by pointing into the partial text in config.

I don't think of declareReg as a regular command, but more like the C/C++ #define VAR VALUE statement. As such, I think it might be a bit ill-formed to use them within conditional branches (in C, using #define within conditional branch would also not work). Granted that this would allow them to be used as like a (global state) pointer. For use-cases like:

ifShortcut 70 71 declareReg my_delay_const 20
ifGesture 085 077 declareReg my_delay_const 21
ifGesture 52 53 declareReg my_delay_const 22
...

delayUntil #my_delay_const 
...

where these are used for like an indirection into "constants". They should instead use something like my second proposal of defining constant:

declareReg tmp 30
declareConst slow_delay 20
declareConst mid_delay 50
declareConst long_delay 300
ifShortcut 70 71 setReg tmp $slow_delay 
ifGesture 085 077 setReg tmp $mid_delay
ifGesture 52 53 setReg tmp $long_delay
...

delayUntil #tmp 
...

I also your hashtable idea, but i think it might be an overkill (and introduce extra memory consumption). The current implementation would only be as costly as like performing a goTo statement (by scanning through the first token on each line).

Reformed idea

Perhaps we should instead only implement the idea of declaring constant, as it can directly be used for register as well? Say if we use the $ token to denote lookup on constant (btw $ is currently useless and should be deprecated?). Then, we can do something like having these in $confConstants:

declare tmp 30
declare slow_delay 20
declare mid_delay 50
declare long_delay 300

and inside macro:

ifShortcut 70 71 setReg $tmp $slow_delay 
ifGesture 085 077 setReg $tmp $mid_delay
ifGesture 52 53 setReg $tmp $long_delay
...

delayUntil #$tmp 
...

As said, the declaration statement should not be treated as regular command. As such, maybe following how C uses #define foo bar, we can use some special token at the beginning of the line as well? Heck, perhaps we might as well use

#define tmp 30
#define slow_delay 20
#define mid_delay 50
#define long_delay 300

as the actual syntax. All these belong to $confConstants, and only #define statements (or comments) can exist inside the config.

(Oh! And since # are treated as comments at the beginning of a line, this solves the issue of syntax highlight or using them in other macros right? These define statements are then like special comments that only has effect inside $confConstants. If you want to include comments inside $confConstants, you can use // instead.)

Re: potential user confusion

I can understand this point, but if we clearly state the usage of #define in the docs of how it is only useful in $confRegMappings/$confConstants, then perhaps they should clearly see how this is different than a regular command when they first come across this feature?

kareltucek commented 2 years ago

Sorry for so late reply!

Re: counterproposal

With the counter-proposal, there would be no $confRegMappings and neither declare nor declareConst. Mappings would be assigned fully automatically on first use of the identifier.

Reformed idea

That is one way to do it. Another is to simply discriminate them by the declaration. I mean, - if you encounter #foo, you could simply look up foo in $confRegMappings and:

I do not have a strong opinion on which way is better. Either way, I think you can directly use the symbol #, which already means basically the same thing as the proposed $.

(Oh! And since # are treated as comments at the beginning of a line, this solves the issue of syntax highlight or using them in other macros right?)

Does it? In order to provide relevant autocomplete, we would still need (to implement a way to) to distribute different grammars for different macro types.

(Just a sidenote: # as a comment character will be probably removed at some point in favor of //.)

soraxas commented 2 years ago

Sorry for so late reply!

That's alright, no problem!

... Mappings would be assigned fully automatically on first use of the identifier

I'm not against that idea, that would be workable. I don't mind implementing it but I think if we use named register to implement the usage of constants, it might be quite limiting with only 32 registers. Perhaps we can use the dynamic table approach with named register, and use the current special macro text approach for declaring constants?

Btw I want to point out that it is unlikely to use hashtable sorting and binary search for speeding up searching, as dynamic declaration means that the set of named registers is not fixed at OnInit. (otherwise, we need to re-sort the table on every declaration). At the end of the day, I think we might still need to scan through an array of char pointers whenever we encounter a named register?

I think you can directly use the symbol #, which already means basically the same thing as the proposed $

Yes true. If we go for the aforementioned approach, should we use a different token for constants?

Does it? In order to provide relevant autocomplete, we would still need (to implement a way to) to distribute different grammars for different macro types.

I meant it in a sense where #define would be recognised as comments in normal macro. Hence, it won't be recognised as a normal command. Making it work with autocomplete is unlikely. However, I reckon autocomplete in agent is unlikely to be context-sensitive in either way. Probably we can provide autocomplete of reserved keywords.

kareltucek commented 2 years ago

I don't mind implementing it but I think if we use named register to implement the usage of constants, it might be quite limiting with only 32 registers.

Indeed.

Perhaps we can use the dynamic table approach with named register, and use the current special macro text approach for declaring constants?

Maybe this is just a premature optimisation. Lately, I have been thinking about following (points 2 and 4 are relevant in this discussion):

At the end of the day, I am fine with any of the three proposals. Each of them has its pros and cons.

Btw I want to point out that it is unlikely to use hashtable sorting and binary search for speeding up searching, as dynamic declaration means that the set of named registers is not fixed at OnInit. (otherwise, we need to re-sort the table on every declaration). At the end of the day, I think we might still need to scan through an array of char pointers whenever we encounter a named register?

We need to deal with allocating them only when lookup fails, which happens at most 32 times (or generally constant-number times).

Yes true. If we go for the aforementioned approach, should we use a different token for constants?

If you decide to go with the hybrid approach, then yes please.


I meant it in a sense where #define would be recognised as comments in normal macro. Hence, it won't be recognised as a normal command.

I see that, but I have somewhat higher aims.

Making it work with autocomplete is unlikely. However, I reckon autocomplete in agent is unlikely to be context-sensitive in either way. Probably we can provide autocomplete of reserved keywords.

I think making it context-sensitive (in the sense of context-free grammar) is workable. A bit of a nuisance since we will need to either find or write parser that accepts EBNF and is willing to parse partial expressions, but doable.

(As for grammars, they will be queried from github by repo and tag, so that Agent can always retrieve grammar for the specific firmware version which is flashed on UHK.)

soraxas commented 2 years ago

Interesting discussion!

Just a quick reply on something notable:

... It would be nice if we could convert it into tokens. (Aka "precompile".) ...

That is definitely a valid use case (and since we always need to save the macro via agent, precompiling adds little to no operational overhead from the user perspective; along with runtime performance). One possible concern is that we need to always go in a bidirectional manner (meaning if we save the pre-compiled version inside the device, we need to de-compile the tokens back into a user-editable form in agent in an approach that is lossess). Comments, for example, would be things that we always need to store in raw format.

My proposal which reserves only 5 bits for index is outright idiotic because it is not scalable. Maybe better candidate would be a dynamic hash table of offset: 16, len: 8, reserved:8, registerData: 32 (all data in one structure), and assumption that we can allocate more space whenever needed.

Dynamic as in malloc whenever we had reached the capacity of the hashtable?

I think making it context-sensitive (in the sense of context-free grammar) is workable. A bit of a nuisance since we will need to either find or write parser that accepts EBNF and is willing to parse partial expressions, but doable.

I have actually started writing a context-free lexer with Ragel to tokenise keywords after you had mentioned the current limitation with string whitespace ".... " in another issue. I have chosen Ragel because it seems to be one (out of other popular lexer) that can avoid using malloc in its generated code. There are also other current maco language syntax that I want propose to adjust for the potential new parser. I will need to clean it up a bit and put it somewhere for discussion. Question: What is your view of being backwards compatible with the current setup (in terms of macro-language syntax)? Introducing a full parser means that there are language choices that had to be adjusted for it to become a context-free language

kareltucek commented 2 years ago

That is definitely a valid use case

I meant doing this thing in the firmware when the config is pulled from EEPROM into RAM. Agent would not be affected at all, just as would not be the config in EEPROM.

Dynamic as in malloc whenever we had reached the capacity of the hashtable?

Sorry, by "dynamic" i meant just "placed in RAM".

As to the malloc, I am not saying that it is not possible, but I meant that we could set it to some size that would be likely to cover usual needs (128 for the moment?), and be able to adjust manually if needed (down or up).

(Then best case scenario is that we can maintain the memory consumption and no further solution is needed, worst case that we will find out that we cannot afford the memory in the long run and that the hybrid approach is needed.)

Question: What is your view of being backwards compatible with the current setup (in terms of macro-language syntax)? Introducing a full parser means that there are language choices that had to be adjusted for it to become a context-free language.

Well, lets take that apart, because the implication in the last sentence seems to suggest some dreadful misunderstanding somewhere.

As to my general attitude to backward compatibility, I am trying not to break compatibility when it is not needed, but I am open to possibility of refactoring the language without keeping backward compatibility. There is number of areas where the language needs refactor just to become more user friendly as well as easier to maintain.

Specifically regarding parser-friendliness, I do expect that some changes will be needed in order to make the language more LL0-parser friendly, and lexer friendly.

But apart from that, I believe this already is a context-free language, and if Ragel has any troubles with it, it raises a question whether it truly is a context-free lexer. (Just the word lexer suggests that it is very likely designed for regular languages.)

(Edit: Actually, this surely is a context-free language, the only question is whether it is unambiguous.)

soraxas commented 2 years ago

I meant doing this thing in the firmware when the config is pulled from EEPROM into RAM. Agent would not be affected at all, just as would not be the config in EEPROM.

Make sense to me, that approach should work pretty well

Sorry, by "dynamic" i meant just "placed in RAM".

As to the malloc, I am not saying that it is not possible, but I meant that we could set it to some size that would be likely to cover usual needs (128 for the moment?), and be able to adjust manually if needed (down or up).

Trying to avoid malloc is good I reckon, better to be safe than sorry. I completely agree with this

But apart from that, I believe this already is a context-free language, and if Ragel has any troubles with it, it raises a question whether it truly is a context-free lexer. (Just the word lexer suggests that it is very likely designed for regular languages.)

My apology, I don't mean to say this is not a context-free language from the formal perspective. And perhaps I didn't gave it too much thought when I said that. What I want to point out is that certain syntax, such as

CONDITION = {ifGesture | ifNotGesture} [IFSHORTCUTFLAGS]* [KEYID]+

requires a cleverly crafted argument placement (placing the INT token at the end to distinguish FLAGS from KEYID) to work around the issue of scanning through a var args input, which might not always be possible for future commands

kareltucek commented 2 years ago

I am wondering if when saying "context-free", you actually mean context-free in the formal sense of Chomsky's hierarchy, or in the sense of natural language.

As far as tokenization goes, I definitely agree that in an ideal case, parsing strategy should not rely on knowledge of a complex grammar, as well as it should not rely on a highly-nontrivial finite automaton (meaning automaton which carries state information across multiple tokens). I.e., I agree that the language needs a refactor so that the problem of tokenization becomes (easily) solvable with a regular (and not context-free) language.

In this sense I am definitely looking forward to your refactor proposal!

kareltucek commented 2 years ago

There are also other current maco language syntax that I want propose to adjust for the potential new parser. I will need to clean it up a bit and put it somewhere for discussion.

Just a reminder - I am still eager to read the proposal!

soraxas commented 2 years ago

Hey thanks for the kind reminder!! Life had got in the way and didn't realise that I've put this aside for a while now. I will surely find some time to get things back on track and let you know:)

kareltucek commented 11 months ago

Named variables are now implemented in official firmware. Thanks for your ideas!