purpleidea / mgmt

Next generation distributed, event-driven, parallel config management!
https://purpleidea.com/tags/mgmtconfig/
GNU General Public License v3.0
3.47k stars 308 forks source link

Solve issue with variable names like $in or $struct #748

Open ffrank opened 4 months ago

ffrank commented 4 months ago

Make variables like $foo and $in and $this.smells.funny into single lexical tokens. This avoids lexing errors when the variable name without the dollar sign looks like a reserved word.

(Frankly, I believe this is the more correct way to do this, and if not the only way, it is probably the easiest.)

ffrank commented 4 months ago

As an aside, I may have found a race in https://github.com/ffrank/mgmt/actions/runs/8247933356/job/22557152150

(I'm fairly sure I didn't introduce that.)

purpleidea commented 4 months ago

Neat! I wasn't sure this was possible while also not blocking some other parsing changes I have. I appreciate the patch, but I will have to delay merging it to make sure it doesn't conflict with some other parsing stuff I have pending. Of note, if you look at the git history, you'll see I had something similar in until I realized it was causing a conflict.

Lastly, there's a good chance I get rid of the $foo style naming entirely (not yet) and moving to just foo. TBD.

The race: yes there is one that I know about that needs fixing.

ffrank commented 3 months ago

So I suppose you are referring to 9e7b7fbb3a3.

That was some funky stuff. The dichotomy of "identifier" vs. "dotted identifier" makes things a bit tricky. The logic you removed in 9e7b7fbb3a3 was complicated indeed, and I can see some ambiguities for the parser resulting from this.

The proposed patch avoids some (all?) of this by allowing the lexer to consume every variable name as a single token, and provide the information that the parser needs to make proper decisions (i.e., is it a dotted or undotted name token).

purpleidea commented 3 months ago

The proposed patch avoids some (all?) of this by allowing the lexer to consume every variable name as a single token, and provide the information that the parser needs to make proper decisions (i.e., is it a dotted or undotted name token).

Yes this is very clever, thank you. I'm slightly worried that doing this might make it more difficult in the future to add parser features without being ambiguous. I will look through my WIP feature branches and have a look.

ffrank commented 3 months ago

FWIW I think it is really really awkward to make the dots and particles of qualified names into first class citizens on the parser level.

This is currently a valid resource

print "it" {
        msg => fmt.printf("%d", $c . a),
}

as is this

print "it" {
        msg => fmt.printf("%d", $c

.

                                                a),
}

I argue that $c.a should always be a token, as should fmt.printf.

Yes even this is accepted right now (this patch only addresses the variable):

import "fmt"

class container {
        $a = 1
}
include container as c

print "it" {
        msg => fmt . printf("%d", $c . a),
} #              ^^^^^^