peggyjs / peggy

Peggy: Parser generator for JavaScript
https://peggyjs.org/
MIT License
933 stars 65 forks source link

Add support for picking fields with `@` #38

Closed hildjj closed 3 years ago

hildjj commented 3 years ago

as discussed at https://github.com/pegjs/pegjs/issues/11,

foo = @bar _ @baz
bar = $"bar"i
baz = $"baz"i
_ = " "*
parse('barbaz') // returns [ 'bar', 'baz' ]

I use this syntax frequently when it's available.

phpnode commented 3 years ago

Like the idea in principle. Will this syntax clash if we decide to use @decorators or @annotations for rules?

hildjj commented 3 years ago

Oh, great question. Let's mock those in when we first implement this, and make sure the grammar doesn't clash. In peg.js, it's pretty straightforward, and applies only on the RHS of rules: https://github.com/pegjs/pegjs/blob/b7b87ea8aeeaa1caf096e2da99fd95a971890ca1/src/parser.pegjs#L148

Mingun commented 3 years ago

I personally prefer to reserve @ sign for the picking operator, because it is very short and take attraction to itself, and for annotations/decorators/attributes use another syntax, for example, Rust-like: #[attribute]. Symbol # now is unused in the grammar so no clash.

Having @ both as picking operator and as annotation operator definitely will create clash on the grammar until semicolons for delimiting rules are optional

hildjj commented 3 years ago

Are there any other ASCII-7 characters available? I'd like to make sure we're not misaligned with https://github.com/tc39/proposal-decorators as it goes to stage 3, if possible.

phpnode commented 3 years ago

# is used to indicate private in JS, maybe that's a better fit? it would imply inverting the syntax though:

foo = bar #_ baz
bar = $"bar"i
baz = $"baz"i
_ = " "*
TymurGubayev commented 3 years ago

How about "any rule starting with an underscore".

Since this could potentially break existing code, hide this behind an opt-in.

Next step is auto-generating such rules, i.e.

foo = bar _ws baz
bar = $"bar"i
baz = $"baz"i
ws = " "*

where _ws isn't explicitly defined.

phpnode commented 3 years ago

@TymurGubayev syntax changes should be purely additive imo. It's really important that we don't break the grammars that exist in the wild. _ is a valid identifier character so we can't use it for this. If we do decide that we want to introduce breaking changes then we should first add a syntax directive (rather than a compiler flag) in the same way protobuf does it.

Mingun commented 3 years ago

Another possible way of evolution -- change how the rule rule itself is defined and thus make it impossible to create clash.

At least two options can be suggested:

hildjj commented 3 years ago
* the other way is just make the optional semicolon in the end of the rule definition required. 
Personally I think that is easest and right way to solve this problem, because 
[modern JS rules](https://eslint.org/docs/rules/semi#require-or-disallow-semicolons-instead-of-asi-semi) 
also requires semicolons to make syntax consistent.

I'm -1 on this. A more nuanced reading of the link above might be "There are problems with ASI no matter which way you choose. Make sure to use lint rules to protect yourself either way." I really don't want us to pick a side on this, because it will alienate a bunch of people either way we pick for not enough benefit.

StoneCypher commented 3 years ago
  • the other way is just make the optional semicolon in the end of the rule definition required.

This has been repeatedly suggested throughout the years. This breaks javascript and must not occur.

StoneCypher commented 3 years ago

because modern JS rules also requires semicolons to make syntax consistent.

No they don't. Your preferred eslint config is not "modern JS rules."

StoneCypher commented 3 years ago

@hildjj - In the linked issue, you suggested that this could be used to ignore productions. At this time I do not yet understand how.

The ability to ignore productions would be huge for me. I maintain a compiler behind peg mostly just for this.

Would you show me how this syntax enables that, or did I misunderstand?

StoneCypher commented 3 years ago

Another possible way of evolution -- change how the rule rule itself is defined and thus make it impossible to create clash.

If there are breaking changes, this is a meaningless anti-fork, and I will just resurrect my fork.

No, we're not going to throw away the entire community on a whim to get a minor feature that can be had other ways. Don't be irresponsible.

hildjj commented 3 years ago

@hildjj - In the linked issue, you suggested that this could be used to ignore productions. At this time I do not yet understand how.

It would allow it by solving a slightly different problem: marking the productions you want to save. If the syntax is small enough, it seems like an approach you might like just as well as marking the ones you want to ignore.

hildjj commented 3 years ago

No, we're not going to throw away the entire community on a whim to get a minor feature that can be had other ways. Don't be irresponsible.

Let's all assume good intent from everyone involved, please. We'll make decisions together, but ideas that we don't take may kick off new ideas that we do like.

StoneCypher commented 3 years ago

We'll make decisions together, but ideas that we don't take may kick off new ideas that we do like.

Any significant breaking change means this library is useless to me, and the original must be re-saved. This will almost certainly be true of every other user of the old library.

I'm letting you know, right now, that people who clung to the old library for a year and a half after its old maintainer murdered it through negligence are doing so because they don't have a choice.

If you make breaking changes, you're creating the exact same extermination choice.

It's important to get this through everyone's head, early, or else you'll lose the whole thing over whimsy.

The way to fork a dying library to save the old community is well understood. Don't re-murder it.

StoneCypher commented 3 years ago

It would allow it by solving a slightly different problem: marking the productions you want to save. If the syntax is small enough, it seems like an approach you might like just as well as marking the ones you want to ignore.

Well, okay, that's fine, maybe even useful

But please understand that it's critically different than the issue you tagged it from, and doesn't do anything at all for people who need to ignore productions

hildjj commented 3 years ago

I'm committed to backward compatibility. I'm committed to doing good semantic versioning to ensure that people can make good decisions about when to upgrade. I'm committed to being a steward for the community's consensus (see my work at the IETF for examples of how much I value consensus over my own opinions).

Over time, hopefully we can build enough trust in those commitments.

StoneCypher commented 3 years ago

I'm committed to backward compatibility

❤️

 

I'm committed to doing good semantic versioning to ensure that people can make good decisions about when to upgrade.

👍

 

my work at the IETF

🦾

Mingun commented 3 years ago

This has been repeatedly suggested throughout the years. This breaks javascript and must not occur.

I'm confusing: how is slight change of the PEG syntax can break (generated) js? Furthermore, requiring semicolons doesn't really introduce new syntax, but reuses existing one. I can't imagine any situation where you won't be able to switch from the optional semicolons to the required ones. That is why I think that this is the most safe way to resolve ambiguity (except using the other symbol, of course).

Your preferred eslint config is not "modern JS rules."

That is not my preferred config, honestly, I'm even don't use JS regularly. It is just ESLint default. And I think that people had some good reasons to make defaulted exactly that variant. So I think, that choice could be called "modern rules".

hildjj commented 3 years ago

Let's not get bogged down in any of the syntax wars here, please. Any valid JS should be able to go into an action block.

Mingun commented 3 years ago

Any valid JS should be able to go into an action block.

Of course, of course, that is holy cow... My suggestion doesn't touch any code blocks at all, it is about semicolons in the grammar, not in the code blocks. Code blocks are always were opaque things and that remains unchanged.

hildjj commented 3 years ago

Nod, that's a helpful distinction.

hildjj commented 3 years ago

PR #89 is pretty close at this point. Now it's time to decide if we like it enough to land it.

I do like it; I use @ enough that i've gone out of my way in a couple of non-production projects to rely on odd versions of pegjs in the past. The implementation is adequate (if a little confusing because of the way the stack manipulation has to happen). Tests gave adequate coverage. I've prototyped ahead the few other places I might like us to use @ and am not concerned about overlaps from a grammar perspective.

Note that the example I added to the end of the documentation is:

list = head:word tail:(_ "," _ @word)* { return [head].concat(tail); }
word = $[a-z]i+
_ = [ \t]*

which shows off how @ can pluck from parens, which is its killer feature, imo.

Mingun commented 3 years ago

I think, that feature is ready to merging, clash with the proposed annotations syntax actually not so big and I think is easy solvable (if not by requiring semicolons, then by using, for example @@annotation, or rust-style annotations)

StoneCypher commented 3 years ago

Needs a version bump for a release

hildjj commented 3 years ago

Yep, I'm going to do that on #101 since this isn't the only feature.