Closed olson-sean-k closed 3 years ago
I've started experimenting on the repetition
branch (edited to refer to final commit on master
). So far, globs like <[!.]*/><0,>[!.]*
work as expected, only matching paths that do not contain components with a name beginning with .
.
paired with an optional repetition also delimited by
<
and>
Unfortunately, this isn't possible, as it creates an ambiguity. Instead of making the entirety of the repetition bounds optional, the bounds within the brackets can be optional. This means the shortest form of the above example would be <[!.]*/><>
. That's ugly, but still saves a couple characters.
In fact, it could be the ultimate representation of a tree token, which is a specific instance of such a pattern.
That likely won't work too well, because tree tokens use some non-trivial logic to determine how they are encoded based on their position within an expression. Representing them as repetition tokens would move this logic into the parser, where it will likely be harder to implement and maintain. Tree tokens will probably remain as-is. Note that repetitions require manual management of component boundaries (separators) that is automatic with tree tokens. There is no single repetition token that can represent a tree token.
There's been more progress on the aforementioned branch and things are nearly complete (at this point, invariant prefixes and rules need a bit more attention). I think I'd like to experiment with an alternative syntax though.
The colon character :
is typically prohibited in file and directory names and so may be a good and natural choice for delimiting a bounds specification within the angle brackets <
and >
. Rather than <[!.]*/><0,>
, the complete pattern becomes <[!.]*/:0,>
. Moreover, there is no ambiguity if the bounds are completely omitted, so this could be shortened to <[!.]*/:>
or even <[!.]*/>
for the zero-or-more case. Contrast that with <[!.]*/><>
. I find the latter a bit awkward.
This has landed on master
in cbeada0 with the <expr:n,m>
syntax. 🎉
Globs tend to be easier to read than regular expressions and Wax attempts to straddle the goals of a simple and familiar syntax suited for paths and an expressive and flexible syntax. Unix-like glob syntax has some severe limitations. Globs do not use the more flexible pattern-repetition form that many regular expression engines support and instead conflate these ideas into concepts like the zero-or-more wildcard
*
, which matches a pattern of anything zero or more times. This is simple and handles most common use cases, but is inflexible.Notably, Wax supports a limited form of character classes, but these patterns are rendered mostly useless by always using an implicit exactly-one repetition. Also note that some expressions are simply impossible, such as rejecting a variable number of directories that are prefixed with a dot
.
.As an escape hatch, Wax could provide an explicit repetition mechanism for more advanced usage. It would function somewhat like alternatives and allow arbitrary nesting, but would importantly allow crossing arbitrary component boundaries. In fact, it could be the ultimate representation of a tree token, which is a specific instance of such a pattern.
One possible syntax could use
<
and>
as delimiters around a sub-glob paired with an optional repetition also delimited by<
and>
. For example,<[!.]*/><0,>
would match zero or more directories that do not begin with a dot.
. This could be shortened with a lack of a repetition specification defaulting to zero-or-more:<[!.]*/>
. A complete specification could include an upper bound, where<[0-9]><1,3>
would match between one and three instances of the digits zero through nine. Today, this can only be expressed as{[0-9],[0-9][0-9],[0-9][0-9][0-9]}
. Yikes.As with alternatives, care would be needed to detect and reject adjacent boundary tokens and zero-or-more wildcards. Additionally, it could be useful to detect and reject nonsense sub-globs, such as singular
*
and**
. Such nonsense patterns are unfortunate, but it may be worth the rough edges since this would greatly increase the expressive power of globs while, for most use cases, keeping things mostly familiar and simple.