Couple of odd consult regex/search issues

jdtsmith commented 3 years ago

With orderless-matching-styles orderless-regexp:

Issue 1: The $ special regex does not work in consult-buffer (or outline, possibly others):

M-x switch-to-buffer el$ : only buffers with names ending in el are shown. M-x consult-buffer el$: no buffers are shown

Issue 2: In consult-imenu, certain characters do not result in search in some buffers.

Invoke consult-imenu on a file with only functions (make one if necessary), then try entering a single lower case character. For most of these, search is immediate. Some characters (or any combination of them) do nothing: c, f, i, n, o, s, t, u. E.g. I could search for ion to find things with the word region. Some of these but not all are narrowing characters. Other narrowing multi-part commands like consult-buffer don't have this issue.

oantolin commented 3 years ago

Issue 1 is due to consult-buffer adding a hidden character at the end of its candidates. You can match buffers ending in el with el.$ (the dot matches the hidden character).

oantolin commented 3 years ago

In issue 2, the characters that seemingly do nothing spell "function". 😉

The reason they seem to do nothing is that they match the word "Function" preprended to the function names in the candidate list. You should be able to see the matching character change color.

minad commented 3 years ago

@jdtsmith You are quite thorough in your exploration and already discovered two special cases :)

@oantolin is correct with his explanations. A few more details:

There is a workaround documented for the orderless matching style dispatcher in the Consult wiki. I consider this an acceptable solution to repair the problem with the disambiguation suffixes, which are needed by certain commands (consult-line, consult-buffer etc)
You are probably using Vertico with the group function enabled. In the case of consult-imenu, the input matches against the group title. Therefore you may not see the matches. Disable vertico-group-format=nil to see what is going on.

If you have good idea on how to improve the status quo please let me know. I considered to highlight the matches in the group titles (https://github.com/minad/vertico/issues/39), but this is difficult/impossible to do correctly in general, since the behavior depends highly on the completion style in use.

jdtsmith commented 3 years ago

@jdtsmith You are quite thorough in your exploration and already discovered two special cases :)

Sorry for the barrage. Really enjoying the new consult+embark+orderless+vertico+corfu ecosystem! Thanks both of you for your efforts. Modular, interoperable, non-monolithic, flexible, leveraging the oft-hidden power of emacs builtins.... everything that's kept me away from the helm/ivy's/company's of the world. You are working in a really distinct space here!

@oantolin is correct with his explanations. A few more details:

There is a workaround documented for the orderless matching style dispatcher in the Consult wiki. I consider this an acceptable solution to repair the problem with the disambiguation suffixes, which are needed by certain commands (consult-line, consult-buffer etc)

I see. So the issue is that multi-part completes could have duplicated names (buffer and file, say), so you append some de-duplications suffix yourself? Or is it primarily for the grouping? Because why would duplicate candidates be a concern? Presumably orderless would just match both of them. Or is the real issue that you need consult to be able to discriminate after the fact which "flavor" of match was returned by orderless/etc.?

You are probably using Vertico with the group function enabled. In the case of consult-imenu, the input matches against the group title. Therefore you may not see the matches. Disable vertico-group-format=nil to see what is going on.

A good guess! So the prepended group title is matched, but actual candidates are not matched. I found that surprising. I turned off group format, and see the other underlying issue: orderless highlights only the first match, which in this case is the (invisible) "ion" in Function, not (as hoped for) the unfill-region function — surprising to the user to say the least (and not just because orderless highlights only first "hidden" match; too many candidate remain!):

here without group formatting:

So you use an invisible unicode suffix for consult-imenu, but a prepended full word (like "Functions") for consult-buffer to disambiguate and group? Why the difference? I presume you don't think matching the invisible "Functions" is a feature, since consult narrowing exists for just that purpose?

If you have good idea on how to improve the status quo please let me know. I considered to highlight the matches in the group titles (minad/vertico#39), but this is difficult/impossible to do correctly in general, since the behavior depends highly on the completion style in use.

There probably isn't any "hidden modification of candidate text" that wouldn't lead to these types of surprising results. I'm sure you already thought of moving the special flavor information per candidate out of band of the candidate text itself, so I presume that must mean that text properties don't survive a full round trip of the (generic) completion framework? compilation-mode makes in-depth use of stored metadata with a 'compilation-message text property, for example.

Without having looked closely at this, it feels like there must be a better solution!

EDIT: Found this in the manual: "Note that the completion functions discard text properties unconditionally, regardless of the value of this variable." :/

minad commented 3 years ago

Or is the real issue that you need consult to be able to discriminate after the fact which "flavor" of match was returned by orderless/etc.?

For example consult-line searches through the lines of a file. There may be duplicate lines. After your selection Consult jumps to the selected line, so it has to distinguish the duplicates. As you found out, text properties cannot be used for disambiguation. There is a deeper reason why text properties do not survive completion: The point is that the completion system builds up (completes) the string step by step from user input (e.g. the minibuffer-complete command used by the default completion UI or Icomplete). During completion, identical candidates, which only differ by their text properties cannot be distinguished.

The invisible disambiguation suffix is a bit ugly and it gave us quite a few problems along the way. However with the orderless style dispatcher from the wiki, there is essentially no problem anymore. I consider this issue to be solved.

So you use an invisible unicode suffix for consult-imenu, but a prepended full word (like "Functions") for consult-buffer to disambiguate and group? Why the difference? I presume you don't think matching the invisible "Functions" is a feature, since consult narrowing exists for just that purpose?

No. consult-imenu does not use an additional invisible disambiguation suffix. The "Functions" prefix is part of the nested imenu structure returned by the imenu backend, in this case the elisp mode. In org buffers you will see the nested headlines for example. So disambiguation suffix and the "Functions" prefix are something totally different.

(There is a small detail - the java imenu backend still returns duplicate candidates for overloaded functions. Therefore consult-imenu will still add a visible disambiguation suffix (1), (2), (3), etc. But I am not using invisible disambiguation suffixes in this case. I think I would go with visible disambiguation suffixes now for most commands, where duplicates are rare. The commands consult-buffer/consult-line/mark/global-mark/isearch are the only ones where I am using invisible disambiguation suffixes such that we get a clean UI.)

Without having looked closely at this, it feels like there must be a better solution!

Look closer, check out minad/vertico#39 regarding group title highlighting. If you have an idea let me know :)

And don't confuse the grouping prefix, e.g., "Functions", with the disambiguation suffix. These things are different. There are also quite a few issues where the tofu/disambiguation prefix/suffix are discussed in case you are interested in reading these discussions.

jdtsmith commented 3 years ago

Thanks for your thoughts, I'm sure this is ground you've covered thoroughly (and here I come wandering in, pointing out obvious features of the landscape :).

Look closer, read through minad/vertico#39. If you have an idea let me know :)

(And don't confuse the grouping prefix, e.g., "Functions", with the disambiguation suffix. These things are different.)

But there is some relationship, no? Both are in the category of "useful information about candidates that we don't show the user, and don't really want completion to be matching on, but need to somehow survive a round trip through a completion system which strips all text properties". So imenu prepends "Functions", etc., but we don't want to match on that. Grep prepends filename-line position, but we don't want to match on that.

So really, what is needed is a way to smuggle extra meta-data about individual candidates around any generic completion styles + UI framework. And to do so without the luxury of preserved text properties. Oy. Will have to think on that.

BTW, an orderless-specific work-around to take out the invisible metadata consult uses under the hood is not really an ideal solution, since it sort of undercuts the "these are all interchangeable parts" spirit of things, which is such an advantage. What happens when some young hot-shot invents a new even more powerful and flexible completion style ("formlessness?").

jdtsmith commented 3 years ago

OK crazy idea (probably unworkable). Use ORDER within the original candidate list as additional information which can survive the lossy completion system:

Collect all the candidates like normal (running grep, assembling buffers/files, calling out to imenu, etc.)
Trim candidates of any irrelevant info you don't want to match on (removing "Function" etc.).
Record a metadata mapping from "trimmed" candidate text to their useful metadata (this one is a function, this one is a line match on line 18, this one is a grep result from file "foo.txt" on line 44, etc.). A hash with an ordered list in each cell could work well for this.
Capture (wrap) the display sort function (if any).
Run the completion.
When the "trimmed" matching candidates are returned to your wrapper sort function, you can reliably expect them to be in the same order they appeared in the original completion table.
Look up their metadata you saved above. Duplicates may exist among the trimmed candidates. No worry: consume the list in that hash cell in order. If one member of a set of duplicate set matches, all will.
NOW (and only now) attach some hidden identifying chars, and hand off to the original display sort function. OR (maybe) display sorters don't strip text properties, so you can go that (simpler) route.
With the now-sorted matched list of candidates in hand, recover their metadata, and display accordingly.

End result: only sorting is (maybe) affected by encoded hidden candidate info. Never matching.

minad commented 3 years ago

But there is some relationship, no? Both are in the category of "useful information about candidates that we don't show the user, and don't really want completion to be matching on, but need to somehow survive a round trip through a completion system which strips all text properties". So imenu prepends "Functions", etc., but we don't want to match on that. Grep prepends filename-line position, but we don't want to match on that.

I want to match on "Functions" and the grep file name. The information is just moved from the candidate to the group title to give a more concise completion UI, it is not about matching or not matching.

But of course your preference may differ, see also https://github.com/minad/consult/issues/342. The problem is that it is hard to cater for both perspectives. This is not easily solved by adding a simple customization option, it will incur major complications.

So really, what is needed is a way to smuggle extra meta-data about individual candidates around any generic completion styles + UI framework. And to do so without the luxury of preserved text properties. Oy. Will have to think on that.

I proposed adding an option to completing-read such that text properties are preserved. I've even implemented a patch. But after rethinking this a bit longer, I reconsidered. I don't think that text property preservation is a good solution, mainly because it is conceptually incompatible with completion. The correct solution imho are disambiguation suffixes - visible ones are trivial. Invisible ones may introduce problems, but it is manageable.

BTW, an orderless-specific work-around to take out the invisible metadata consult uses under the hood is not really an ideal solution

As I see it there is no severe issue here. The built-in completion styles all work with disambiguation suffixes (e.g. basic matches from the beginning). Originally I had disambiguation prefixes, which even broke the basic completion style. This wasn't a good idea, so I switched to disambiguation suffixes. For external completion styles (or filtering systems) one can always find an easy workaround. For example Prescient (to be used with Selectrum) also has a trivial workaround documented in the wiki.

since it sort of undercuts the "these are all interchangeable parts" spirit of things, which is such an advantage.

Yes, this introduces problems at the package boundaries. Unfortunately these are unavoidable. Consult also comes with a bit of integration code for different completion UIs, which is unavoidable (the refresh issue we discussed for example), since the generic API is not powerful enough. There is still some value in trying to cut down the amount of required integration code as much as possible - the consult-vertico.el seems to be pretty ideal given that we don't have a standard API to request the current candidate and to trigger refreshing. Also Embark needs integration code - see for example the Vertico integration code. The goal for me is to make it as easy as possible to integrate different packages, but this does not mean that we can get away without any integration code.

Another package you may wonder about is embark-consult.el, which integrates consult with embark. This package is optional but strongly recommended. I don't see it as an integration package, but more as a package which provides genuine Embark-Consult-specific functionality, like exporting from consult-line to an occur buffer.

jdtsmith commented 3 years ago

I want to match on "Functions" and the grep file name. The information is just moved from the candidate to the group title to give a more concise completion UI, it is not about matching or not matching.

Well that does change matters. Perhaps it's because you understand that "Function" is hidden there underneath it all. For the non-developer user coming from the outside, that wouldn't at all be obvious, and you'd simply not expect some hidden text to represent a valuable matching target. Rather, if I type "ion" expecting to match unfill-region, and the result is ALL MATCHES without any highlight, that screams "buggy!".

I guess another approach is to have a small "consult-orderless" and "consult-prescient" and ... package for users?

minad commented 3 years ago

OK crazy idea (probably unworkable). Use ORDER within the original candidate list as additional information which can survive the lossy completion system:

This does not seem like a good idea. Sorting is highly dependent on the completion UI you use, so there is no easy generic way to retrieve the order from the lossy completion system. We already keep the original candidate strings around and recover them after completion, see the :lookup property of consult--read. The lookup relies on the candidates being unique though.

Instead of using disambiguation suffixes one could generally write integration code for each UI to figure out the actual candidate which was selected. This would at least work for UIs like Vertico, Selectrum, Icomplete and Embark live completion. But it would not work for default completion which does not have a notion of a selected candidate. The selected candidate is always the input string or the candidate you clicked on in the *Completions* buffer, where duplicates are even filtered out.

minad commented 3 years ago

Well that does change matters. Perhaps it's because you understand that "Function" is hidden there underneath it all. For the non-developer user coming from the outside, that wouldn't at all be obvious, and you'd simply not expect some hidden text to represent a valuable matching target.

Yes, it may be good to resolve this by adding highlighting to the group titles. Unfortunately I have not found a generic solution to do this, see https://github.com/minad/vertico/issues/39. Feel free to make your own experiments. (EDIT: See https://github.com/minad/vertico/commit/32706ace5c118b3ed0c6b40fdd6dee19a5c62708 for the current solution, which works well for the Consult use cases.)

I guess another approach is to have a small "consult-orderless" and "consult-prescient" and ... package for users?

These packages could provide special completion styles which work out of the box with Consult (e.g. the $ suffix or ignoring the grep-filename/Functions prefix). But given that these packages would be tiny it makes more sense to apply such configurations on the level of the user configuration or integrating it directly with the corresponding packages. Usually we try to avoid adding huge amounts of integration code, in particular if there is not a severe issue. For me there is no issue here, it is just that one has to understand how the matching works. Of course this may not seem like the most user friendly take on this problem, but there are hundreds of such design problems. Sometimes people come and want to have the system match their expectations or their former experiences precisely.

minad commented 3 years ago

I have added a solution to Vertico which improves on the group title highlighting, see https://github.com/minad/vertico/commit/32706ace5c118b3ed0c6b40fdd6dee19a5c62708. It is not a general solution but it will cover the main use cases in Consult, where the group title is the prefix of the candidate string. It should resolve the confusion regarding how grouped candidates are matched. Please give it a try.

jdtsmith commented 3 years ago

OK crazy idea (probably unworkable). Use ORDER within the original candidate list as additional information which can survive the lossy completion system:

This does not seem like a good idea. Sorting is highly dependent on the completion UI you use, so there is no easy generic way to retrieve the order from the lossy completion system.

Do you mean "sorting occurs even within try-completion, etc?" Obviously every system has their own display-sort-function... but the idea was to wrap that to pass through the order info... Maybe the assumption that "completion systems do all their sorting in their sort functions" isn't robust.

minad commented 3 years ago

Sorting is complicated, there are too many cooks involved. The completion UI does sorting by default, the table may provide a display-sort-function, the completion style may compose the display-sort-function with its own sorting function (e.g. flex does that). Additionally third party packages like Prescient introduce (additional?) sorting. The completion UI can do whatever it wants. So given this background, your idea does not seem like a good idea at first sight, but I didn't think about it thoroughly. Furthermore I am confident that the disambiguation suffix is a good solution.

jdtsmith commented 3 years ago

Sometimes people come and want to have the system match their expectations or their former experiences precisely.

Definitely familiar with that problem (from both directions :), and I definitely work to adapt myself to existing behavior. But I'd strong suspect that in a panel of 100 competent users, all 100 will think:

represents a bug.

Will give the new group highlighter a try (but even then, I'd prefer NOT to match on group, since it will "shadow" more specific matches, and narrowing is perfectly good for that already). But at least it will be obvious then what is happening.

Thanks.

minad commented 3 years ago

Will give the new group highlighter a try (but even then, I'd prefer NOT to match on group, since it will "shadow" more specific matches, and narrowing is perfectly good for that already). But at least it will be obvious then what is happening.

Yes, it will be obvious then. I think the group title highlighting is a clear improvement - I had also been bothered by this as you can see in https://github.com/minad/vertico/issues/39. But not matching on the group is your preference, which is different from mine. So with high likelihood we won't see a change in behavior as long as nobody comes along proposing a patch.

Note that #342 actually presents a solution, see https://github.com/minad/consult/issues/342#issuecomment-868380340. I think this is the correct approach - if you want to change the way the completion system filters the candidates, write a completion style. A completion style is better than truncating the candidates early on during candidate generation, since this would make it impossible to match against the untruncated candidates (which is the behavior I prefer).

You are right that narrowing already offers another possibility to restrict the candidate set, which is orthogonal to the matching (This is the in-band vs out-of-band state for filtering). However narrowing is less general than grouping, for example consult-grep does not support narrowing but groups the candidates by file name. Both grouping and narrowing somehow came into existence during the development of Consult - if I would start all over again, maybe I wouldn't even add narrowing anymore since I prefer the in-band state for filtering. However I still think narrowing is convenient for consult-buffer, consult-imenu, despite the overlap with grouping.

jdtsmith commented 3 years ago

I tried out the new restored group highlighting, and it does work well to demonstrate what's happening; a clear improvement indeed. It's still, IMO, not ideal that you can't narrow down to any candidates using any part of the word "Function", for example, but at least you see it happening. Thanks for implementing.

jdtsmith commented 3 years ago

if I would start all over again, maybe I wouldn't even add narrowing anymore since I prefer the in-band state for filtering. However I still think narrowing is convenient for consult-buffer, consult-imenu, despite the overlap with grouping.

I can see that for sure. The ideal combo would be an orderless dispatcher character that instructs it to "only search this in the group name", and otherwise only search in the "rest". So I could say Func# and up would come all functions; expanding to Func# ion would narrow to my specific one. But that would require knowing the difference between the "parts" of a candidate — not easy to do generically!

Thanks again for these great tools and discussion, I'll let you get back to your day.

minad commented 3 years ago

It's still, IMO, not ideal that you can't narrow down to any candidates using any part of the word "Function", for example, but at least you see it happening.

Yes, that's indeed a problem. This it is not a problem of Consult, but of Orderless or the completion style you use. If one has a candidate "the-word-twice-occurs-twice" and one enters "twice twice" one may expect that only the candidates match where "twice" occurs twice.

As you suggest one could of course strip the "Function" of the candidates. I tried to explain this above - this is not a good solution, since Imenus are provided by the Imenu backends and I want to preserve the structure as provided by the backends. We could hack this on the frontend layer or you could even write your own Imenu backend, which "fixes" this and returns the candidate as a flat list.

But as I mentioned the right way to approach the problem is by writing a custom completion style. For example one could write a completion style (or an orderless style dispatcher), which supports filter words like "@f", which then translate to "\\`Function". If the candidate strings would be truncated right away, this possibility would be gone. There is some value in preserving the full candidates.

minad commented 3 years ago

I can see that for sure. The ideal combo would be an orderless dispatcher character that instructs it to "only search this in the group name", and otherwise only search in the "rest". So I could say Func# and up would come all functions; expanding to Func# ion would narrow to my specific one. But that would require knowing the difference between the "parts" of a candidate — not easy to do generically!

Yes, this is impossible to do generically. Fortunately you can configure completion styles per completion category, you can write a completion style which is only used for the imenu completion category. Personally I think this is overdoing it, but it may make sense to do this for a handful of commands you use very often and where you have special requirements. As a general solution, Orderless with the style dispatcher documented in the Consult wiki works well enough.

minad commented 3 years ago

@jdtsmith You are confusing/mixing multiple things here. Let's look at this on a higher level. You are bugged by the asymmetry regarding the searchability of the imenu/buffer type? Which behavior do you prefer, both searchable or both non-searchable?

jdtsmith commented 3 years ago

I realized it might be better to move this to a new issue to avoid confusing it with the other things I pointed out. See #385; I'll clarify there.

minad / consult

Couple of odd consult regex/search issues #365