pomsky-lang / pomsky

A new, portable, regular expression language
https://pomsky-lang.org
Apache License 2.0
1.28k stars 19 forks source link

Allow using _ (underscore) in group name #76

Closed jck closed 1 year ago

jck commented 1 year ago

Current behavior:

❯ pomsky ":grp_name('test')"
error P0102(syntax): 
  × Group name contains illegal code point `_` (U+005F). Group names must be ASCII only.
   ╭────
 1 │ :grp_name('test')
   ·     ┬
   ·     ╰── error occurred here
   ╰────
error: could not compile expression due to previous error

It would be nice if we could use underscores in the group name.

Aloso commented 1 year ago

Support for underscores was dropped when I discovered that Java doesn't support group names containing underscores. The error message could be improved, since _ is ASCII, but it's not a letter or digit.

We could still support underscores in other regex engines, since Java seems to be the only flavor which doesn't support them. However, I would prefer to have consistent parsing rules across all flavors. And since Java recommends camelCase instead of snake_case, the Java devs don't have a strong incentive to add underscore support.

jck commented 1 year ago

I see. That is unfortunate. Using camelCase in python does not feel good, but I am not sure if that is worth adding a parsing inconsistency.

If you do decide that group names shouldn't allow underscores, the documentation needs to be updated.

The formal grammar specified that group name was a Name token. And Name says this:

Names (or identifiers) consist of a letter or underscore (_), followed by any number of letters, digits and underscores.

Aloso commented 1 year ago

Good point. I'll update the documentation.

Aloso commented 1 year ago

The documentation has been updated. I am closing this to declutter the issue tracker, since the proposal is not actionable until Java supports underscores in group names.