xsawyerx / guacamole

Guacamole is a parser toolkit for Standard Perl. It provides fully static BNF-based parsing capability to a reasonable subset of Perl.
https://metacpan.org/pod/Guacamole
20 stars 8 forks source link

Suggestion: Assigning a scalar to an array without parens should be a warning #99

Closed scottchiefbaker closed 3 years ago

scottchiefbaker commented 4 years ago

I ran in to syntax today I haven't seen before:

my @a = "one";

For the purposes of Guacamole I don't think you should be able to assign a single scalar to an array like this. It should require parens, or qw() or something.

my @a = ("one"); # Preferred

This is a non-intuitive syntax, and we should encourage people to avoid this.

xsawyerx commented 4 years ago

Guacamole parses and tokenizes this. An interpreter would have to validate whether it is correct to do this.

With Gaucamole, you should write a linter that checks for it. Maybe we should edit the topic to write a linter policy for it?

scottchiefbaker commented 4 years ago

Well I'm confused then...

Guacamole already makes several judgements about valid Perl that is not considered "standard". My opinion is that this syntax should not be considered "standard".

It's not a huge deal to me, it just seems like something that would confuse a novice (it confused me), and something we would want to flag as non-standard.

xsawyerx commented 3 years ago

I think there are two different things being confused here. I'm not sure how best to explain them:

  1. Valid syntax, runnable syntax
  2. Correct syntax, non-buggy syntax

my @x = "one" is valid syntax because it has my, @ + identifier, = assignment, and a value on the right-hand side. my @x = "one"two is invalid syntax.

While it's valid, it's not correct. It could work, but you probably shouldn't do that. That's a more nuanced perspective which should be done by an interpreter or a static analyzer built on top of the parser tokenizer that Guacamole provides.

scottchiefbaker commented 3 years ago

I think perhaps I'm the one confused here. Clearly this is both valid, and correct syntax:

my $x = [1,2,3];
foreach (@$x) { say $_; }

but it's not considered "standard" per Guacamole. What distinguishes something as being non-standard then? My argument for the original syntax above is that it is valid syntax, but it should not be considered correct. Assigning a scalar value to variable with a list sigil should not be considered correct.

I'm not losing sleep over this, it's not a huge deal. It just feels like syntax that would trip up a novice. For the purposes of "standard" I think we should be as explicitly clear as we can in our syntax.

xsawyerx commented 3 years ago

I think perhaps I'm the one confused here. Clearly this is both valid, and correct syntax:

my $x = [1,2,3];
foreach (@$x) { say $_; }

but it's not considered "standard" per Guacamole. What distinguishes something as being non-standard then? My argument for the original syntax above is that it is valid syntax, but it should not be considered correct. Assigning a scalar value to variable with a list sigil should not be considered correct.

I see what you're saying. Essentially, you're right. Let me break down why I rather we don't do it.

First, to clarify why you're right, using another example. The code you provided above fails because it's using @$x instead of @{$x}. We have indeed made a judgment call here, because things like @$$foo->[0]{'bar'}{'baz'} are very hard to parse in our head, even if perl knows how, and it will likely not do what we want.

It's fairly simple to say that a sigil can only follow a brace or an identifier. No additional context necessary.

In the case of my @a = "one" we break it up to the left-hand-side of an = and the right-hand-side of it. Then each one is parsed independently. That assures that the parsing is simple and consistent and works for all inputs, but it does skip things like this being a clear bite-in-the-ass. You're definitely right about that.

Unfortunately, to make it unsupported, the parsing of = will need to be very specific. We will need to check if the right-hand-side is a single value (which is a literal value or a variable) and then check if the left-hand-side is storing this into a single scalar (or maybe into a hash entry or a single array entry, and then fail the parsing. That's quite nuanced and complicated.

Instead, I prefer we stick with "an equal sign has a right-hand-side and a left-hand-side. Each must be correct independently."

I'm not losing sleep over this, it's not a huge deal. It just feels like syntax that would trip up a novice. For the purposes of "standard" I think we should be as explicitly clear as we can in our syntax.

While we want the best syntax, we also need a reasonable parser to write and maintain. The above example shows how very complicated it can get for just a single use-case. Normally, the BNF would allow this to work and then the parser would say "stop, I see what you're trying to do and we don't want you to do it." This can be part of the parsing, but not using a BNF, usually. We're missing the post-lexing "stop, don't do this" part and maybe we'll add it, but not right now.

scottchiefbaker commented 3 years ago

Thanks for the detailed write-up.