mawww / kakoune

mawww's experiment for a better code editor
http://kakoune.org
The Unlicense
9.87k stars 712 forks source link

Support for intersection and complement in regular expressions #2123

Open shachaf opened 6 years ago

shachaf commented 6 years ago

Regular languages support intersection and complement, in theory. Most regex engines don't support these -- maybe because they can cause multiplicative or exponential NFA size -- though vim supports intersection with \&.

These would be useful to have in Kakoune. The exponential case isn't a big problem because the maximum number of regex VM instructions is bounded anyway.

lenormf commented 6 years ago

This would typically be implemented using selections ("intersection" is implemented in alt-z).

What is your use case for having that implemented in regular expressions directly?

shachaf commented 6 years ago

I don't think that's the kind of intersection I'm talking about? I mean conjunction and negation of patterns, so you can write e.g. (P&~Q) to mean something that matches one pattern but not another. When these are outermost, you can get the same effect using editor features -- e.g. <a-K>P<ret> just means <a-k>~P<ret> -- but you can't do that when they're inside a larger regex.

shachaf commented 6 years ago

Full intersection/complement admittedly might be complicated to implement, but even complement only in outermost position -- which doesn't require any fancy transformations, just a negation after the match -- would be quite useful. Things liks <a-K> already exist, but e.g. recently I wanted an InsertKey hook that triggers for all except a certain set of keys, and ended up writing something pretty awkward. GNU less supports outermost negation, for some precedent.

Delapouite commented 2 years ago

Looks like such a feature has good chances to be added to ECMAScript regular expressions: https://github.com/tc39/proposal-regexp-set-notation#readme

shachaf commented 2 years ago

Ah, that's interesting! It looks like they're this proposal is just for character classes, which is much easier to implement than for arbitrary regular expressions. (The latter may not be worth it, honestly.)