Closed minad closed 9 months ago
@oantolin This PR should be ready from my side. I think we want these improvements independently of how we proceed on #162.
Hi @minad! Thanks for working on this. These are pretty big changes, so it might take me a while to go through them. Right off the bat, one thing I'm concerned about is replacing the use of rx
's or
with a simple (string-join regexps "\\|")
; I used rx
because of the optimizations it performs, which may not be very sophisticated, but at least it sometimes recognizes shared prefixes and suffixes among the alternatives in an or
and that for single characters x|y
is [xy]
. For example:
(rx (or "altar" "alter")) == "\\(?:alt\\(?:[ae]r\\)\\)"
Now, it may be the case that these optimizations don't really matter, or they might make things worse, or maybe in most uses of orderless
they don't actually trigger, but I think it's a good idea to use them unless we have benchmarks saying they make things worse. Also, by continuing to use rx
we might benefit from future improvements to its compiler.
Thanks for looking at it. I had a similar concern. As far as I understand the optimizations don't set in since we are combining regexp
s with or
:
(rx (or (regexp "altar") (regexp "alter"))) -> "altar\\|alter"
So right now I think using rx is just an inefficient way of doing string concatenation.
If we would want the full power of rx optimizations to unfold we could generalize the style compilers to also return rx expressions themselves and then only in the final step convert the expression to a string. With #162 matching style functions could then return three things:
Changes to orderless-filter
are breaking changes which would affect any users that haven't abandoned Selectrum. Maybe it's OK to break things for them, I don't know. That can't be many people.
Changes to orderless-filter are breaking changes which would affect any users that haven't abandoned Selectrum. Maybe it's OK to break things for them, I don't know. That can't be many people.
There are no real breaking changes. Except saving the match data. Not sure if this matters. I made sure that both orderless-filter and orderless-highlight-matches stay intact as is (also in #162). However both functions are not used internally anymore, so we could also consider deprecating them.
There are no real breaking changes. Except saving the match data. Not sure if this matters.
What I meant is that, if I recall correctly, Selectrum users directly set some Selectrum variable to orderless-filter
. That function has changed its name and its parameter list.
Oh, sorry, I didn't realize you kept orderless-filter
in addition to adding an orderless--filter
. I'll go over things more carefully.
What I meant is that, if I recall correctly, Selectrum users directly set some Selectrum variable to orderless-filter. That function has changed its name and its parameter list.
Yes, they did use that function directly. But I didn't change orderless-filter. The function is still there, it just uses orderless--filter internally. I took great care to preserve the public API. (EDIT: I saw that you just noticed.)
As far as I understand the optimizations don't set in since we are combining
regexp
s withor
This unfortunately seems to be correct. I like your suggestion of adding the option to return rx
expressions from matching styles.
This unfortunately seems to be correct. I like your suggestion of adding the option to return rx expressions from matching styles.
I am not 100% convinced. Maybe it is cheap enough, but it seems to me that using rx is more intended to be used at compile time via macros. Do you think this generalization is worth it? Will some matchers get simpler if we do that? Can we profit from additional rx optimizations?
One thing I am wondering about now is if my string concatenation with string-join instead of rx-to-string breaks orderless--anchored-quoted-regexp. I have to check.
EDIT: I've implemented rx support in https://github.com/oantolin/orderless/pull/161/commits/674a1c75d5861b36de29b2fa1957146a3ee34163. This seems to make things a little bit simpler. I am not sure about potential performance improvements or regressions because of that.
Another nice improvement of using rx is that we can use (literal string) instead of regexp-quote. See https://github.com/oantolin/orderless/pull/161/commits/40f75b6bdf1574b1d50d04a482c529050ed37d2e. Now this change will definitely yield optimization potential for rx.
I agree that it's not clear what effect rx
will have on performance. I always thought it would be worth it, which is why I had used (rx-to-string (or ...))
, which was apparently doing nothing. I think if we also change orderless-literal
to use rx
, then some common cases, like using both literal and flex, will at least factor out the starting letter.
Another nice improvement of using rx is that we can use (literal string) instead of regexp-quote. See 40f75b6. Now this change will definitely yield optimization potential for rx.
Oh, I didn't see this comment nor the commit before I wrote my previous comment.
One small downside is that we have to drop 26.1 support, see the previous commit. But probably one can also write (or "literal")
instead of (literal "literal")
.
I was going to suggest or
or seq
for literals just because I am not entirely sure what "runtime" means in the description of literal
. 😅
I was going to suggest or or seq for literals just because I am not entirely sure what "runtime" means in the description of literal. 😅
It just means that FORM in (literal FORM) is evaluated. Since we use a string, it evaluates to itself. The description says the same about (regexp FORM) btw.
Oh, I understand what "runtime" means now, it is the thing I first thought: that rx
would compile to an expression that evaluates the argument to literal
. What confused me is an optimization: if that argument is directly a string rather than a more complicated expression evaluating to one, then rx
can expand to a string. Compare the macroexpansions of (rx "oh " (literal "hi"))
and (rx "oh " (literal (concat "h" "i")))
.
EDIT: better example.
I get the same result? This is expected, isn't it?
(rx "oh " (literal "hi")) -> "oh hi"
(rx "oh " (literal (concat "h" "i"))) -> "oh hi"
I get the same result? This is expected, isn't it?
No, don't compare the values, compare the macroexpansions.
Oh, interesting! It compiles to a runtime expression. In that case, it would be pretty bad if the rx compiler wouldn't special case literal strings!
(concat "oh "
(regexp-quote
(concat "h" "i")))
Also, only the first of those works with rx-to-string
:
(rx-to-string `(seq "oh "(literal "hi")))
(rx-to-string `(seq "oh "(literal (concat "h" "i"))))
Also, only the first of those works with rx-to-string:
I think that's a bug. rx-to-string
is at runtime, so it should just evaluate directly. But this problem shouldn't concern us I think.
I think that's a bug.
rx-to-string
is at runtime, so it should just evaluate directly. But this problem shouldn't concern us I think.
I agree it could be considered a bug, but it does sort of concern us, because the pattern compiler uses rx-to-string
. So we can only use literal
and regexp
with actual strings, not other kinds of expressions, which is of course, what your PR does. Now that I say that, I remember figuring this out for regexp
a while back, but I had forgotten.
OK, this looks fantastic. I especially like the new orderless--compile
: even from just the name orderless--prefix+pattern
you could tell I was doing something ugly. 😄
I think I've learned all I can by reading and thinking and now I should test the code for a bit, but pending that it definitely looks like a great improvement, ready to merge.
I like the rx
stuff but maybe we should be cautious about possible performance regressions (I mean, we do compile regexps on every keystroke), so for now I'm just testing what the optimize-predicate
reference points to, which is the last commit before the rx
stuff. (I also don't know enough git to know how to get your more recent commits on this branch 😄 .)
I agree it could be considered a bug, but it does sort of concern us, because the pattern compiler uses rx-to-string. So we can only use literal and regexp with actual strings, not other kinds of expressions, which is of course, what your PR does. Now that I say that, I remember figuring this out for regexp a while back, but I had forgotten.
I meant that we are not affected as long as we don't generate runtime expressions. Thinking more about this, I retract my statement. This is not a bug since if you generate an rx expression at runtime you should also just evaluate the string right away and not expect rx to call the less efficient eval.
OK, this looks fantastic. I especially like the new orderless--compile: even from just the name orderless--prefix+pattern you could tell I was doing something ugly. 😄 I think I've learned all I can by reading and thinking and now I should test the code for a bit, but pending that it definitely looks like a great improvement, ready to merge.
Thanks. Yes, please test this for a while. For me it seems work well so far. I have installed the Orderless version from #162 which goes a little bit further than what we have here.
I like the rx stuff but maybe we should be cautious about possible performance regressions (I mean, we do compile regexps on every keystroke),...
As far as I know rx is developed by an expert Lisper and we can be pretty sure that it will be efficient enough. They also care about details like rx-to-string issue with runtime evaluation. So I wouldn't be worried too much about this. Also note that my patch makes sure that we call the pattern compiler only once. If compilation suddenly takes twice as long, we would still be good. ;)
Things look good so far. I've been testing up to commit d536e44f37715b1cd56c1121ef4f5744c83ed182 because that's what minad/optimize-predicate currently points to. Is there a way for me to get the following commits? Or do you need to force-push before I can check them out?
Things look good so far. I've been testing up to commit https://github.com/oantolin/orderless/commit/d536e44f37715b1cd56c1121ef4f5744c83ed182 because that's what minad/optimize-predicate currently points to. Is there a way for me to get the following commits? Or do you need to force-push before I can check them out?
I think you can pull the newest version git pull minad optimize-predicate
? minad/optimize-predicate points to https://github.com/minad/orderless/commit/9e515ee9a3e0c5accd87f41326d144e1ddd1e1a5.
minad/optimize-predicate points to 9e515ee9a3e0c5accd87f41326d144e1ddd1e1a5.
It does now! When I first pulled it gave me d536e44f37715b1cd56c1121ef4f5744c83ed182 even though I thought I pulled after you had written and we discussed the rx stuff. I must be confused about when I ran git pull.
Mmh, rx
doesn't actually seem to optimize many cases I would have liked: not even literal + flex:
(rx (or "abc" (seq "a" (* any) "b" (* any) "c"))) ; result is "abc\\|a.*b.*c"
And for orderless-flex
it is worse: even if future enhancements optimized the above example, we still wouldn't see optimization for this:
(orderless-pattern-compiler "abc" '(orderless-literal orderless-flex))
;; result is ("\\(?:abc\\|\\(a\\)[^b]*\\(b\\)[^c]*\\(c\\)\\)")
because of the groups used for highlighting the matches.
Even so, I like the rx
generalization and think we should keep it.
Yes, I am also happy with the change. I think the generated rx can still be better in some special scenarios (e.g. multiple literals combined with or) and I also like the code simplification.
If there are obvious optimizations which are missing in rx, we can also report them as a bug upstream. Many recent rx.el commits were made by Mattias Engdegård. He also maintains relint.el and made many performance-related improvements in the Emacs byte compiler and runtime system. I think he is interested in performance-related improvement proposals or patches.
And for orderless-flex it is worse: even if future enhancements optimized the above example, we still wouldn't see optimization for this:
Regarding the grouping, maybe it is possible to compile two different regexp representations. One for matching only and one for highlighting. But at this point the added complexity may not be worth it. Before doing anything in that direction we should check with benchmarks.
Oh, good idea about separate matching and highlighting regexps! I also agree that we shouldn't do it unless there is a convincing performance benefit.
OK, I can't think of any more tests. :) Thanks a lot for this, maybe especially for only calling the compiler once.
Oh, I forgot to say: I decided to stick with literal
for literals, because it reads better, but if any 26.1 users complain, I'll switch to seq
.
This turned into a slightly larger refactoring than I had expected: