Intentionally selecting query method

noctuid commented 4 years ago

For example, if I type sl as a query with the intention of it being an intialism, I'd rather all matches (or at least the first ones) be those matched by an initialism. This could be achieved by sorting initialisms first, but this package doesn't do sorting, and that would be overly complicated and slower.

What might be nice if there was some way to mark a query as something specific. I was thinking you could potentially customize different separators for different query types, which would allow doing this without any extra keypresses. For example - could match an initialism, could match an initialism or a regexp, etc. There could be a default if you haven't typed a separator yet. Even better would be if you could set a default per-query number (and as an alist per command). I guess this could potentially have speed improvements as well.

For example, I might set the default for query 1 to be initialism and the default for query 2 to be a regexp. as a separator would not change things from the default. Normally, I would start with an initialism and then add a regexp if needed. If I wanted to start with a regexp instead, I would just add a space at the beginning to start the second query. For some commands, I would probably want to reverse that order (or maybe I could just do query 1 regexp, query 2 initialism, query 3 regexp instead of having different defaults for different commands; I'd have to play around with it).

In another example, maybe I realize I didn't type the initialism correctly (e.g. included too many letters). Instead of deleting the query, I could change it to fuzzy by using . (or whatever) as a separator after it.

I guess this would require adding more commands for configuring separators. It's a little more complicated, but it would be completely optional, and it seems like it could be worth it to me. Most of the benefit for me would come from having different default matching types for different queries, so I think having different separators for different query types would be the less often useful part. Something similar (but not quite as flexible) could be achieved by just initially hitting the separator key enough times to get to the query type you want. I guess you could add even more rules to also be able to pick a specific query type beforehand, like you could specify two separators in a row... but that makes things even more complicated. I think it's probably enough to be able to change the query type afterward to handle the "oops, I actually need fuzzy" case. And even then, if it was the first query, you could just go to the beginning of the line and add the necessary number of separators to get it to be in the fuzzy query position. So maybe using separators to specify the query type isn't useful enough to justify the extra complexity.

What do you think? Do you have any other ideas?

oantolin commented 4 years ago

Is it OK if I say all of those ideas sound like they'd complicate things for not enough benefit? I'm not saying I definitely wouldn't do something like that, but how about doing this instead: a dispatching matching style, where the last character of a component determines the type of matching?

(defcustom orderless-dispatcher-alist
  '((?- . orderless-initialism)
    (?, . orderless-prefixes)
    (?. . orderless-flex)
    (t . orderless-regexp))
  "Alist mapping component suffixes to matchers.
To specify a default matcher, use the key t."
  :type '(alist :key-type (choice (character :tag "Char")
                                  (const :tag "Default" t))
                :value-type function)
  :group 'orderless)

(defun orderless-dispatcher (component)
  "Match COMPONENT dispatching on last character.
See `orderless-dispatcher-alist' to configure the dispatching."
  (let* ((n (length component))
         (type (cdr (assq (aref component (1- n))
                          orderless-dispatcher-alist))))
    (if type
        (funcall type (substring component 0 (1- n)))
      (funcall (cdr (assq t orderless-dispatcher-alist))
               component))))

(setq orderless-component-matching-styles '(orderless-dispatcher))

With that setup, each component matches as a regexp by default, adding a - makes it an initialism, . makes it flex, etc. The components are still space-spearated. I picked suffixes because that way you can change the type of the last component easily.

Would this be good enough? If it is good enough, we can decide whether or not to add it to orderless, if not, you can certainly stick this in your config.

noctuid commented 4 years ago

Is it OK if I say all of those ideas sound like they'd complicate things for not enough benefit?

No, you're fired.

With that setup, each component matches as a regexp by default, adding a - makes it an initialism, . makes it flex, etc.

Though it's an extra keypress, it's a lot more straightforward since it doesn't mess with the separators. This seems like much a better alternative to my suggestion. Maybe there's a case where it would be useful to specify multiple styles, but I can't think of one. However, the fallback/default should be able to use multiple styles. I would use this if added to conditionally enable flex. Maybe something similar to look at the first letter could be added too.

I don't think that implementation as-is would integrate well with my other suggestion though. Maybe if you could customize the function used for dispatching to styles instead of creating a new style this functionality could be supported without too much complexity (e.g. function that given a a component and component index returns the styles to use). With that, I could implement all of this easily even if none of it gets into orderless. If there's not an elegant way to do this, I agree with your hesitance to add this magic to the queries. It shouldn't be done in this package if it's going to be a significant complication.

As for my other suggestion, it seems pretty simple to implement to me. Just add component indices to the loop in orderless--component-regexps and check if the index has an entry in a variable. If there's a match, use it instead of orderless-component-matching-styles. split-string already handles things correctly for empty components, but the user would have to set the separator to a single space for what I described to work (but that wouldn't require extra code). Am I missing something?

As for how to integrate these two, one way (instead of combining the logic into a single function) would be to make the style dispatcher variable a list. If the first function returned nil, it would try the second. For my setup, I could have the first function look for prefixes and suffixes (or they could be separate functions), the second would look at a variable to see if an index was matched, and the third would check orderless-component-matching-styles.

I think it would be pretty easy to additionally do it based on the current command, but that's a separate issue. I think it would be nice for orderless-component-matching-styles regardless of whether this functionality is implemented. Some commands are far more suited to initialism. Really good for describe-variable. Really bad in general for something like swiper-isearch.

If you don't want to add any of this directly, what do you think about adding a customizable list of style dispatchers to enable implementing it easily outside of orderless?

One edge case: empty queries. Either all styles should be able to handle them (maybe you could do something special for an empty query if you really want to), or they should be ignored.

oantolin commented 4 years ago

First about empty queries. I think I'd like to keep things as they are. Currently the pattern is split on orderless-component-separator and any empty portions are discarded (split-string is called with omit-nulls set to t).

Now, how about this proposal, which I think is the simplest one that is backward compatible and allows you to implement all the complicated dispatching in your own configuration:

Currently orderless-component-matching-styles is a list of matching styles that get or-ed together, and a matching style is a function from strings to strings that maps a component to the regexp used to match it.

I propose to generalize the concept of matching style to the following: it is a function of one or two arguments, the first argument is a string, the second if present is an integer, the result should be nil or a string. If a matching style returns nil it means it cannot handle that component and it doesn't contribute to the final regexp. If a matching style takes two arguments, it is called with the component index as second argument.

The point of allowing the second argument is clear: that way you can do your positional discrimination.

The point of allowing nil is to make rewriting the dispatcher I wrote above easier: you now write little matching styles like "initialism if the last character is -" that just return nil if they don't see the suffix you want.

My plan would be to just generalize the definition of matching style as above, implement the corresponding changes in orderless--component-regexps and let you implement whatever prefix, suffix and positional handling you want in your config.

Does that sound good enough?

oantolin commented 4 years ago

Just to be clear, the new orderless--component-regexps would be as follows:


(defun orderless--component-regexps (pattern)
  "Build regexps to match PATTERN.
Consults `orderless-component-matching-styles' to decide what to
match."
  (let ((components (split-string pattern orderless-component-separator t)))
    (if orderless-component-matching-styles
        (cl-loop for component in components and index from 1
                 collect
                 (rx-to-string
                  `(or
                    ,@(cl-loop for style in orderless-component-matching-styles
                               for regexp = (if (eq (func-arity style) 1)
                                                (funcall style component)
                                              (funcall style component index))
                               when regexp collect `(regexp ,regexp)))))
      components)))

Also,I forgot to mention, when you implement your matching styles you, you can examine this-command.

I think doing it this way let's us off the hook on designing exactly how to map prefixes, suffixes, component index, this-command, etc. to regexps. I get the feeling that mapping might be intensely personal, but any such combination should be implementable with this interface.

noctuid commented 4 years ago

one that is backward compatible and allows you to implement all the complicated dispatching in your own configuration:

My suggestion is also backwards compatible and only a couple minor changes. I'm not sure I understand the benefit over my suggestion. It works to use orderless-component-matching-styles, but it's a little non-intuitive to me that it's a style when it's actually dispatching to other styles. That's not a huge deal. More importantly, it would be less work for a user to implement an alternate dispatcher. You could just return a style function or a list of style functions instead of trying to run one or more yourself. It's a bit of extra work that could be handled by orderless. That seems to be the main difference between the two. Maybe I'm missing a downside of my suggestion or explained it poorly. Let me know if I can clarify anything.

oantolin commented 4 years ago

Maybe I'm missing a downside of my suggestion

No, you're right, your suggestion is better even though it involves an additional variable, which I think I'd call orderless-positional-overrides. It would be a sequence and its nth item, if non-nil, would be used in place of orderless-component-matching-styles for the nth component of the input pattern, right?

But I don't think I understood this part of your suggestion:

split-string already handles things correctly for empty components, but the user would have to set the separator to a single space for what I described to work (but that wouldn't require extra code).

I don't see what the separator has to do with anything.

oantolin commented 4 years ago

Wait, no, on second thought I think I just don't understand your proposal at all. Would you mind explaining it to me again (but this time with the knowledge that I wasn't smart enough to understand the first explanation!)?

oantolin commented 4 years ago

OK, how about this proposal (which may or may not be what you had in mind)?

A style spec is either:

a function, which just gets applied to the component and should produce a regexp or nil.
a list (s1 s2 ...), which tries each style spec on the component to get a regexp or nil and combines the non-nils with (rx (or ...)).
a vector [s1 s2 ...], which tries each style spec on the component to get a regexp or nil one at a time and keeps the first non-nil result.

So basically, a list means or, and a vector means "first one that actually returns a regexp wins".

There would be two configuration variables: orderless-component-matching-styles and orderless-positional-overrides. The first is, as now, the default style spec to use for a component. The second variable is used to override for components at a certain index (as I described above).

oantolin commented 4 years ago

I implemented my previous proposal in a branch called "positional" (3d2fb9c). (Documentation still needs updating in certain parts.)

I maybe should have waited until I understood your proposal, but I think it has to be at least somewhat like this.

oantolin commented 4 years ago

Under my proposal say you wanted the first component to be an initialism and each successive component to be either flex if it end in ~ or regexp otherwise, except the fourth component which should match both as prefixes or initialisms, you'd configure as follows:

(defun flex-if-twiddle (pattern)
  (when (string-suffix-p "~" pattern)
    (orderless-flex (substring pattern 0 (1- (length pattern))))))

(setq orderless-positional-overrides
      '(orderless-initialism nil nil (orderless-prefixes orderless-initialism))
      orderless-component-matching-styles
      [flex-if-twiddle orderless-regexp])

noctuid commented 4 years ago

My proposal is similar, but instead of having the user-defined function return a regexp, it would return a list of styles and a (possibly updated) query. It would take the first non-nil style list and use that. It requires slightly less work in the user-defined code, since the user doesn't have actually have to generate the regexp:

(setq orderless-style-dispatchers
      '(flex-or-regexp-if-twiddle
          my-positional-styles-dispatcher
        ;; just uses orderless-component-matching-styles
        orderless-component-matching-styles-dispatcher))

(defun flex-or-regexp-if-twiddle (pattern _index)
  (when (string-suffix-p "~" pattern)
    (list '(orderless-flex orderless-regexp)
          (substring pattern 0 (1- (length pattern))))))

The difference is not huge if orderless provides a helper function for this:

(rx-to-string
                  `(or
                    ,@(cl-loop for style in orderless-component-matching-styles
                               collect `(regexp ,(funcall style component)))))

One small advantage of returning an updated query to pass to future style dispatchers is that I could, for example, have flex-if-twiddle ignore ~~ and replace it with ~.

I don't like the idea of orderless-positional-overrides because I'd rather not use the positional logic until after the logic for checking the query contents. I could still implement the same functionality, but it would require duplicating the flex-if-twiddle style for every entry in orderless-positional-overrides.

Overall the two implementations aren't much different, but I think having an orderless-style-dispatchers would be a little less confusing (no "meta" styles) and a little simpler to work with. Go with whichever you think is better. If you stick with the current one, I'd ask that you add a helper function for generating regexps from a style list and that styles be passed the component index as well instead of having the extra variable.

I don't see what the separator has to do with anything.

Regardless of which implementation is used, it would be nice if orderless--component-regexps didn't pass OMIT-NULLS as t. Instead it would use them only for the purpose of incrementing the component index. I could then set orderless-component-separator to " " instead of " +" to switch to the next component index by typing " " at the query start.

oantolin commented 4 years ago

I think I got it! First, the stuff I hadn't understood about wanting null components was cleared up by this:

it would be nice if orderless--component-regexps didn't pass OMIT-NULLS as t. Instead it would use them only for the purpose of incrementing the component index. I could then set orderless-component-separator to " " instead of " +" to switch to the next component index by typing " " at the query start.

I can keep the default separator " +" so that you don't get null components by default, but can opt in if you want.

Now, for the main part of your proposal, I think this convinced me your way is better:

I don't like the idea of orderless-positional-overrides because I'd rather not use the positional logic until after the logic for checking the query contents. I could still implement the same functionality, but it would require duplicating the flex-if-twiddle style for every entry in orderless-positional-overrides.

Also, I think I went overboard with the whole recursive structure for matching style, I doubt anyone would need something like (A [B (C D) (E F)] [H (I J)]). So I guess it's simpler that your proposal basically only supports one layer of recursion, if we feel it does cover all the cases.

Let me just repeat your proposal to see if I got it right:

There are two types of function used:

Matching styles, which, as now are functions that take a string and return a regexp (in particular they cannot return nil).
Style dispatchers, which take a string and an index, and return either nil or a pair of a list of matching styles and an updated string to apply them to.

How about instead of (list styles new-string) as in your code, we allow (cons new-string styles) or just styles (if the first element is a string, it is the new string, if not, the new string is the same as the old one)?

The dispatch logic is then: for each component try each of the dispatchers and use the updated query and styles for the first one returning non-nil; if all return nil, use the original orderless-component-matching-styles variable.

(I added that last part so that we don't need orderless-component-matching-styles-dispatcher.)

That sounds good and simpler than what I was suggesting with the tree of lists and vector.

I'll take a crack at implementing it later today.

oantolin commented 4 years ago

Here's a working first version (f30a3e9). Please test, and please read the docstring to make sure they are clear (try to approach them with fresh eyes, as if you hadn't designed the feature :P).

oantolin commented 4 years ago

The example configuration in this comment, would be done as follows in the setup:

(defun flex-if-twiddle (pattern _index)
  (when (string-suffix-p "~" pattern)
    (cons 'orderless-flex (substring pattern 0 (1- (length pattern))))))

(defun positional-dispatcher (pattern index)
  (pcase index
    (0 'orderless-initialism)
    (3 '(orderless-prefixes orderless-initialism))))

(setq orderless-style-dispatchers
      '(flex-if-twiddle positional-dispatcher)
      orderless-component-matching-styles
      'orderless-regexp)

Notice that for convenience if you only want one matching style, you don't need to put it in a list.

noctuid commented 4 years ago

How about instead of (list styles new-string) as in your code, we allow (cons new-string styles) or just styles (if the first element is a string, it is the new string, if not, the new string is the same as the old one)?

The only reason I did a list instead of a cons was thinking it might make more sense if the style dispatcher ever needed to return another argument in the future. I can't think of anything else it would return, so probably a cons is good.

(I added that last part so that we don't need orderless-component-matching-styles-dispatcher.)

Yes, that's a better idea.

Please test, and please read the docstring to make sure they are clear (try to approach them with fresh eyes, as if you hadn't designed the feature :P).

Thanks a ton. I will try this when I get a chance (probably Monday). I'll need to play with things to see how restrictive I can get for more precise filtering without excluding candidates I actually want. If you enable the wiki, I can can add whatever I end up with on there (e.g. example for implementing command based dispatch). With this you can pretty much do anything you want with queries. You could add negative matching for queries prefixed with ! like a lot of completion frameworks allow for (I don't think I would use that personally but it's great that it's possible). Probably I will try out every crazy possibility I can think of and then settle on something more reasonable. I can't believe I didn't even know about completion-styles a couple of weeks ago.

noctuid commented 4 years ago

Looking at the docstrings, I think they explain it well. One additional feature that might be nice to have is if a dispatcher could decline to handle a component but still alter the query to clean up after itself. For example, if I wanted f: to signify a fuzzy query normally, I could do \f: or whatever to tell the fuzzy dispatcher to ignore the component but update it to f: for future queries.

noctuid commented 4 years ago

Tried it out briefly, and it is working as expected. Tried to make a slight improvement for M-x, describe-variable, describe-function, etc. where I know a package prefix but not the full name of what I'm looking for (so I can't do initialism). Added a dispatcher that translates a pattern to match a full word:

2020 04 26_11:05:12_-__Minibuf1

2020 04 26_11:05:24_-__Minibuf1

I'm using orderless-strict-leading-initialism by default, so I've also added an index dispatcher to relax that to orderless-strict-initialism for later components which allows for this:

2020 04 26_11:20:34_-__Minibuf1

2020 04 26_11:14:26_-__Minibuf1

oantolin commented 4 years ago

Thanks for testing! And your dispatchers look really cool! I'd like to take you up on your kind offer to write about them on the wiki (eventually, no rush, of course).

Looking at the docstrings, I think they explain it well. One additional feature that might be nice to have is if a dispatcher could decline to handle a component but still alter the query to clean up after itself. For example, if I wanted f: to signify a fuzzy query normally, I could do \f: or whatever to tell the fuzzy dispatcher to ignore the component but update it to f: for future queries.

EDIT: THIS IS WRONG: What is currently possible is for a dispatcher to return (cons nil new-component), it signals that you want to change component to new-component and have orderless-component-matching-styles applied to it.

I take it you mean you'd like to update the component and continue running the subsequent dispatchers?

noctuid commented 4 years ago

I'd like to take you up on your kind offer to write about them on the wiki (eventually, no rush, of course).

Will do. I think you may need to manually open the wiki in settings. I can't access it currently.

But I take it you mean you'd like to update the component and continue running the subsequent dispatchers?

Yeah, another example of what this would allow would be potentially combining multiple regexp shortcuts. For example, there could be a base style dispatcher that looks for regexp: or r:, flex:, etc. as a prefix. Regexp shortcut dispatchers could then expand the query, mark it as a regex by prefixing it with regex: (if it wasn't already prefixed with it), and then and pass it onto future dispatchers without finalizing it.

oantolin commented 4 years ago

OK, I made it so returning a string from a dispatcher indicates you want to replace that component and continue with dispatch (7e57a54). I'll update the docstrings later.

Unless further testing while I'm updating the docstrings and README reveals some problem, I think I'll bump the version number and merge into master once the documentation is finished.

I'll look into enabling the wiki.

oantolin commented 4 years ago

I've enabled the wiki.

What should it mean if a style dispatcher returns (cons nil new-string)? It currently means "this component never matches anything" which seems wasteful. How about making that mean the same as (cons orderless-component-matching-styles new-string)? (As I erroneously claimed it did above.) Or should that be the way you signal you want the string replaced but dispatching to continue (instead of returning just the string)?

oantolin commented 4 years ago

I went with (nil . new-string) specifying that you want the default styles (i.e., (or orderless-component-matching-styles 'orderless-regexp)) applied to new-string. Add I added light documentation to the README. Consider this commit a prerelease version: f8efbda. I'll wait a little bit in case you have further comments.

Thanks so much for designing this functionality. I think this really enables you do whatever you want while being pretty user-friendly.

clemera commented 4 years ago

I don't have the time currently to fully dig into this but this all sounds very nice! As an additional input: I would favour a matching style where matching is done by strict initialism first and as soon as I type a space switch to literal and on addtional input switch to regexp any order for the whole input (at least I think this would be great, I haven't tried it).

Sorry if this might be somewhat out of context I only skimmed this thread. Just want to add another possibly relevant consideration.

oantolin commented 4 years ago

That isn't possible with the current system. It would be possible if the style dispatchers received an additional argument indicating the total number of components.

But then someone will come up with something else we can't do. I think that, like all DSLs, this one is being pushed towards Turing completeness. Probably it would be better to just let users provide a function to turn the query into a list of regexps, but include a bunch helper functions to make it easy to do so. I'll try that approach in another branch, and if it looks like it supports the stuff in the dispatch branch comfortably, I'll go with that instead.

clemera commented 4 years ago

That sounds great. As an additional note, I think it would be great to have a setting to force the result order to be according to the order or matching styles or in your new approach possibly the order of returned regexps. AFAIK it is currently not possible to sort initialism before regexp matches for example. Probably the candidates would need to be filtered for each style separately and the results concatenated?

oantolin commented 4 years ago

I'd rather not do any sorting at all, for two reasons. The main reason is speed and simplicity of implementation: there is a function all-completions that returns all completions in a given completion table matching all regexps in the completion-regexp-list. This function is written in C! This entire package is a glorified wrapper to that function. Emacs Lisp is pretty slow and I don't want an option whose use makes everything ten times slower.

The second reason is that there are two many cooks in the sorting kitchen! A completion source does sorting, a completion UI typically also does sorting. Why should the middle layer, the completion style also do sorting? It would wreck any sensible sorting done by the source (I guess we could check and only do sorting if the source does not specify an order) and it would get overruled by the sorting the UI does. In fact, it might only be visible at all if you manage to disable the comlpetion UI's sorting.

That said, Icomplete, my favorite completion UI, does only mild sorting: I think it only sorts by length keeping the order it was given between items of the same length, and only if the source doesn't specify an order. So Icomplete will probably be respectful enough of any sorting we did. And selectrum does agressive sorting, but I think it can easily be turned off. I don't remember about Ivy off the top of my head, but presumably sorting can be disabled too.

So maybe the too many cooks objection isn't that important.

Also, instead of sorting by matching style, the goal of having the completion you want pop up sooner might be better served by using fewer completion styles for any given completion operation, which is what this issue is about. If you set things up so that, say, ".ifs" matches "ifs" as an initialism (and terms without a leading dot match as regexps, say), you just get fewer matches overall --and there's no need to sort "ifs"-as-initialism before "ifs"-as-regexp, because ".ifs" only uses initialism. I'd experiment with this feature before deciding if sorting is really necessary, this has the potential to be much faster and simpler than sorting at the expense of a little more personal configuration. (I'm actually really tempted to use initialism for the first component, regexp for the rest --I think that covers most cases without requiring any extra symbols in the input.)

clemera commented 4 years ago

This entire package is a glorified wrapper to that function. Emacs Lisp is pretty slow and I don't want an option whose use makes everything ten times slower.

Given how fast orderless is and that the sorting according to completion styles could be optional it may not be that bad.

That said, Icomplete, my favorite completion UI, does only mild sorting: I think it only sorts by length keeping the order it was given between items of the same length, and only if the source doesn't specify an order.

I think icomplete does sort using minibuffer history via completion-all-sorted-completions. Good point that sorting might be destroyed by the completion UI, I did not thought of that.

Also, instead of sorting by matching style, the goal of having the completion you want pop up sooner might be better served by using fewer completion styles for any given completion operation, which is what this issue is about.

I agree, probably not worth to include this functionality one could always manually filter multiple times with one completion style if really needed.

I'm actually really tempted to use initialism for the first component, regexp for the rest --I think that covers most cases without requiring any extra symbols in the input.

That sound really great and would probably a good fit for the most common use case.

noctuid commented 4 years ago

I would favour a matching style where matching is done by strict initialism first and as soon as I type a space switch to literal and on addtional input switch to regexp any order for the whole input (at least I think this would be great, I haven't tried it).

That isn't possible with the current system.

I'm not sure I understand the ask. Is it to change matching for previous components or just to be able to type spaces to change to literal and then to regexp? The latter is currently possible.

Probably it would be better to just let users provide a function to turn the query into a list of regexps,

I'd rather this be an alternative to the current system and not replace it. I think the current system is simple and handles more than I think I would ever need (and I'm already using it in my config). If someone really needed something more, they could just override orderless--component-regexps, or you could make it variable and let them write their own version if they wanted total control.

As an additional note, I think it would be great to have a setting to force the result order to be according to the order or matching styles or in your new approach possibly the order of returned regexps.

The main reason I made this issue is as an alternative to sorting some query types higher (instead I'm restricting the query types used for components and making the query types more limiting). So far I haven't really had issues where lack of sorting was a problem. I'm not sure if the time required to sort is worth the trade off; I'd have to try it. I'm happy enough without it though.

oantolin commented 4 years ago

@clemera

I agree, probably not worth to include this functionality one could always manually filter multiple times with one completion style twice if really needed.

Let's get a version of this issue in the package, use it for a while, and you can always open an issue on sorting later if you wish.

That sound really great and would probably a good fit for the most common use case.

We could make it an example in the README or wiki

@noctuid

I'm not sure I understand the ask. Is it to change matching for previous components or just to be able to type spaces to change to literal and then to regexp?

I answered assuming @clemera meant that the first component starts off matching as an initialism, but as soon as there is a second component the first component now matches as a literal. That's what I said isn't possible.

I'd rather this be an alternative to the current system and not replace it.

Well, with enough helpers, it's pretty comparable to the dispatch system, just a little wordier. I'm almost finished with it, so I propose to finish and let you see what your configuration would look like under this new system. If the new system is much uglier we can go the dispatch branch.

If someone really needed something more, they could just override orderless--component-regexps, or you could make it variable and let them write their own version if they wanted total control.

That's certainly a good option too. Which variant is better? Documenting that people seeking total control should override orderless--component-regexps? Or adding an orderless-to-regexps-function that defaults to orderless--component-regexps?

clemera commented 4 years ago

I answered assuming @clemera meant that the first component starts off matching as an initialism, but as soon as there is a second component the first component now matches as a literal.

Yes that is what I meant, to be more specific as soon as I type space. And then when I continue typing after the space switch to regex out of order matching.

noctuid commented 4 years ago

I answered assuming @clemera meant that the first component starts off matching as an initialism, but as soon as there is a second component the first component now matches as a literal.

Yes that is what I meant, to be more specific as soon as I type space. And then when I continue typing after the space switch to regex out of order matching.

It doesn't seem useful to to retroactively change a previous component's style since everything you saw while typing it would now be invalid. Not sure I understand.

That's certainly a good option too. Which variant is better? Documenting that people seeking total control should override orderless--component-regexps? Or adding an orderless-to-regexps-function that defaults to orderless--component-regexps?

Probably it should be a variable to make it explicit that it's okay/allowed to be customized.

clemera commented 4 years ago

Not sure I understand.

For example: If I use initialism and type "cus" in describe-function I get custom-unloaded-symbol-p so if I would new the initials it would be very fast to get it. Now if I actually don't search for that but for ps-print-customize I can type a space and continue the pattern "cus print" and I get that. I think using this technique would allow me to get matches by initials very fast and if I don't know the exact name get that with out of order matching and it would all be kind of automatic.

oantolin commented 4 years ago

This branch (8fb9c34) is my attempt at making the "pattern compiler" customizable but providing enough helper functions to make the stuff in the dispatch branch convenient.

I ... don't really like it. The previous example of:

twiddle means flex, everywhere
first component is an initialism,
fourth is by prefixes or initialism
all others are literal

would be:

(defun flex-if-twiddle (pattern)
  (when (string-suffix-p "~" pattern)
    (cons 'orderless-flex (substring pattern 0 (1- (length pattern))))))

(defun positional-dispatcher (pattern index)
  (pcase index
    (0 'orderless-initialism)
    (3 (orderless-or 'orderless-prefixes 'orderless-initialism))))

(setq orderless-component-matching-styles
      (orderless-seq
       'flex-if-twiddle
       'positional-dispatcher
       (lambda (_) 'orderless-literal)))

(I can add helper for constant handlers, so the lambda could be replaced with (orderless-unconditionally 'orderless-literal) or something like.)

The thing is that functions, specially ones returned by higher-order functions, are much less inspectable than lists of symbols. This way, if you used the customize interface to inspect the values of variables you'd get a lot of byte-code gobbledygook or long (un-byte-compiled) sexps.

I think I'm not going with this, but will instead improve the dispatch branch in two ways: (1) allowing dispatchers to have any of the following argument lists: (component), (component index), (component index total), and (2) adding a variable to completely replace the pattern compiler for people doing something very different.

Even just (1) allows for this "component 1 changes matching style depending on whether there are more components or not" that @clemera wants. Also, even if you never use the total, being able to take just a component will save on unused _index arguments.

Maybe I'll refactor the dispatching pattern compiler a bit, to introduce some of these helper functions, for people that want to completely replace the pattern compiler.

(By the way, this is the first time I've ever used git branches to try out different ideas and I love it!)

oantolin commented 4 years ago

OK, here's the updated dispatch branch (74947d0).

I introduced a new variable orderless-pattern-compiler that defaults to #'orderless--component-regexps, and I now call the dispatchers with 3 arguments: the component, the index and the total number of components.

I wanted to allow dispatcher of any arity between 1 and 3 and only call them with as many arguments as they could take. I was using func-arity to do this, but that function was introduced in Emacs 26.1 and I don't want to depend on it.

So, @noctuid, would you mind updating your config to add a _total argument to your dispatchers?

noctuid commented 4 years ago

For example: If I use initialism and type "cus" in describe-function I get custom-unloaded-symbol-p so if I would new the initials it would be very fast to get it. Now if I actually don't search for that but for ps-print-customize I can type a space and continue the pattern "cus print" and I get that. I think using this technique would allow me to get matches by initials very fast and if I don't know the exact name get that with out of order matching and it would all be kind of automatic.

Okay, so you have the first component as both initialism and regex by default. Right now I would almost always use initialism for the first query. If I didn't know the initials of a function, I would know its prefix and do something like o,s (for orderless, wouldn't give any initialism matches) and then use initialism for the second query once I see the options.

If I wanted to explicitly mark the first component as regexp, I'd probably prefer to do something like <space>cus, r;cus, or cus;r. cus only gives one initialism match in the first place, and I think in most cases there wouldn't be enough initialism matches to make marking the previous query as a regexp useful for me. I have keys bound to immediately select any of the first 10 candidates.

I think I'm not going with this, but will instead improve the dispatch branch in two ways: (1) allowing dispatchers to have any of the following argument lists: (component), (component index), (component index total),

Don't think anymore arguments will be necessary, but for forward compatibility, you could just require that dispatchers have &rest _ (or use keyword args).

For reference, here is what choosing a style for a query looks like with the dispatch branch:

  ;; style dispatchers
  (defconst noct-regexp-to-styles-alist
    `((,(rx (or (and bol "r;")
                (and bol "regexp;")))
       orderless-regexp)
      (,(rx (or (and bol "f;")
                (and ";f" eol)))
       orderless-flex))
    "Alist of regexp to list of styles to use for an orderless component.")

  (cl-defun noct-orderless-specified-styles (pattern)
    "Return a list of styles and a PATTERN without the style specifier."
    (dolist (style-specifier noct-regexp-to-styles-alist)
      (let ((re (car style-specifier))
            (styles (cdr style-specifier)))
        (when (string-match re pattern)
          (cl-return-from noct-orderless-specified-styles
            (cons styles
                  (let ((start (match-beginning 0))
                        (end (match-end 0)))
                    (concat (substring pattern 0 start)
                            (substring pattern end)))))))))

  (defun noct-orderless-style-dispatcher (pattern &rest _)
    "Set style for PATTERN if it matches `noct-regexp-to-styles-alist'."
    (noct-orderless-specified-styles pattern))

Will put this and some other stuff on the wiki once I've messed around with different command-based default styles.

clemera commented 4 years ago

@noctuid Maybe your matching method would also work for me, currently I'm just guessing what I would prefer, I have to experiment a bit. If I think I have a method worth to share I will propose it for the wiki.

oantolin commented 4 years ago

@noctuid Am I right in thinking that you are already testing the updated dispatch branch? (I thought you might because you already have the &rest _ parameter in your noct-orderless-style-dispatcher.)

If you are and everything seems to be working fine, let me now so I can merge into master.

clemera commented 4 years ago

This looks like a great flexible solution to let each component control its own matching style. But as far as I understand there is no easy way to change the matching style for the whole input at once.

One way I think this could be supported which seems to nicely fit together with the current approach would be to have an additional function the user could configure which gets called with the whole input. It can return the matching styles to use for the input and only if it returns nil (or the function for that variable is not set) component wise matching dispatch gets applied (when the user has configured orderless-style-dispatchers) . That might be a convenient addition for this issue, what do you think?

oantolin commented 4 years ago

I already added a variable orderless-pattern-compiler that you set to a function that gets called with the entire input and returns the list of regexps to use (for the entire input). You can use this pretty easily to implement the mechanism you suggested or any other.

clemera commented 4 years ago

I see, but for this the user has to dig into the details of how orderless works and needs to write quite some code. The idea to have component wise matching is nice but it's very sophisticated. It's of course useful if you want to have such a query language but I think for many use cases having a simple way to toggle orderless-component-matching-styles is enough.

The function I proposed would have the benefit to have an easy way to configure what matching styles to use without doing much work (just map input to matching styles). Here is a simple example:

(defun orderless-component-matching-styles-dispatch (input)
  (cond ((string-suffix-p "," input)
         '(orderless-flex))
        ((string-match " " input)
         '(orderless-regexp))
        (t
         '(orderless-strict-leading-initialism))))

oantolin commented 4 years ago

Just to make sure I understand, you still want splitting into components and such, right? So the effect that function is supposed to have is the following (assuming the default separator, to fix ideas)?

If the input ends in comma, match each component via flex (including the last component, which contains a comma).
If there is more than one component but no comma at the end, match each component as a regexp.
If there is a single component and it does not end in comma, match it as a strict leading initialism.

I doubt you'd get many flex matches because of the comma. You probably wanted a trailing comma to mean: "remove the trailing comma, then match the new components via flex", right?

At any rate, rather than adding a third style selection mechanism (which would probably have to be more complex than you are suggesting to avoid the comma problem), I think I'd add helper functions to make writing the kind of dispatcher you are suggesting easy. I guess with the current version of the dispatch branch it is already pretty easy:

(defun orderless-component-matching-styles-dispatch (input)
  (let ((orderless-component-matching-styles
         (cond ((string-suffix-p "," input)
                (setq input (substring input 0 -1)) ; remove if you really wanted the ,
                '(orderless-flex))
               ((string-match " " input)
                '(orderless-regexp))
               (t
                '(orderless-strict-leading-initialism))))
        orderless-style-dispatcher)
    (orderless--regexp-components input)))

That seems already pretty close to what you wrote. A helper function could make it even closer.

clemera commented 4 years ago

I doubt you'd get many flex matches because of the comma. You probably wanted a trailing comma to mean: "remove the trailing comma, then match the new components via flex", right?

Yes, I did not give the function much thought, just used it as a quick demo.

That seems already pretty close to what you wrote. A helper function could make it even closer.

Great! So the variable I am suggesting should be bound to function which takes the input and returns a cons of new input and styles.

clemera commented 4 years ago

To be more clear I'm suggesting that the user is able to put something like the following in his config:

(setq orderless-matching-style-dispatch-function
      (defun my-orderless-style-dispatcher (input)
        (cond ((string-suffix-p ","  input)
               (cons (substring input 0 -1) '(orderless-flex)))
              ((string-match " " input)
               (cons input '(orderless-regexp)))
              (t
               (cons input '(orderless-strict-leading-initialism))))))

and by setting this variable overrides orderless-style-dispatchers which could be renamed to orderless-component-style-dispatchers which would probably make the distinction more clear.

oantolin commented 4 years ago

As I said, you can already do this:

(setq orderless-pattern-compiler
      (defun my-orderless-style-dispatcher (input)
        (let ((orderless-component-matching-styles
               (cond ((string-suffix-p "," input)
                      (setq input (substring input 0 -1))
                      '(orderless-flex))
                     ((string-match " " input)
                      '(orderless-regexp))
                     (t
                      '(orderless-strict-leading-initialism))))
              orderless-style-dispatcher)
          (orderless--regexp-components input))))

If that's not good enough, I can certainly write a helper function orderless-splitter so that you can use exactly the definition of my-orderless-style-dispatcher that you want:

(setq orderless-pattern-compiler
      (orderless-splitter
       (defun my-orderless-style-dispatcher (input)
         (cond ((string-suffix-p ","  input)
                (cons (substring input 0 -1) '(orderless-flex)))
               ((string-match " " input)
                (cons input '(orderless-regexp)))
               (t
                (cons input '(orderless-strict-leading-initialism)))))))

What I don't want to do is add another variable, orderless-matching-style-dispatch-function, since I feel its purpose is adequately served by the more general orderless-pattern-compiler.

Do we really need another variable for this?

A possible definition of orderless-splitter is:

(defun orderless-splitter (global-dispatcher)
  (lambda (input)
    (pcase-let ((`(,string . ,styles) (funcall global-dispatcher input)))
      (let ((orderless-component-matching-styles styles)
            orderless-style-dispatchers)
        (orderless--component-regexps string)))))

oantolin commented 4 years ago

And I hope you don't mind me saying that I think this strategy of "I want exactly the same set of styles for every single component, but I want that set to depend on the input string" seems pretty niche. Isn't being able to change the style for individual components more likely to be what people want?

clemera commented 4 years ago

If that's not good enough, I can certainly write a helper function orderless-splitter so that you can use exactly the definition of my-orderless-style-dispatcher that you want

You are thinking from an elisp hacker perspective, I personally don't want/need exactly such a definition. I'm always trying to think from the user perspective which is typically not knee deep into elisp ;)

And I hope you don't mind me saying that I think this strategy of "I want exactly the same set of styles for every single component, but I want that set to depend on the input string" seems pretty niche.

Could be, to my mind this seems to be the less niche than wanting to use the component wise query language, hard to tell, for that we would need to ask more people. I wouldn't have made the suggestion if I thought it would be niche but maybe I'm in the niche and don't know it.

oantolin commented 4 years ago

I think my question is exactly this:

If a user can write:

(setq orderless-pattern-compiler
      (orderless-splitter
       (defun my-orderless-style-dispatcher (input)
         (cond ((string-suffix-p ","  input)
                (cons (substring input 0 -1) '(orderless-flex)))
               ((string-match " " input)
                (cons input '(orderless-regexp)))
               (t
                (cons input '(orderless-strict-leading-initialism)))))))

would you still want to add another variable so the user can instead write this?

(setq orderless-matching-style-dispatch-function
      (defun my-orderless-style-dispatcher (input)
        (cond ((string-suffix-p ","  input)
               (cons (substring input 0 -1) '(orderless-flex)))
              ((string-match " " input)
               (cons input '(orderless-regexp)))
              (t
               (cons input '(orderless-strict-leading-initialism))))))

To be honest they seem equally difficult for the user to me: the user still needs to write the my-orderless-style-dispatcher function either way. Are you saying they are not of equal difficulty and it is worth shaving off the call to orderless-splitter (or whatever we decide to call it)?

oantolin commented 4 years ago

(Add it's OK if your answer is "Obviously yes! Having to call orderless-splitter is superconfusing for most people! You are clueless!", because I totally might be clueless.)

clemera commented 4 years ago

I feel convinced that keeping a single entry point for any other matching needs is more desirable. This also avoids the risk of adding a variable for a potentially niche concern. I would suggest to mention something like the following in the docstring of orderless-pattern-compiler:

For your convenience there are helper functions to create pattern compiler functions named
`orderless-pattern-compiler-....-constructor`.

And then call orderless-splitter something like orderless-pattern-compiler-global-dispatch-constructor. Mentioning this in the docstring (maybe inline a simple example too) and using the prefix orderless-pattern-compiler should make this discoverable enough and new helpers using this naming scheme can be added any time.

clemera commented 4 years ago

Oh and it should probably be mentioned that pattern compile function has highest precedence and can override orderless-style-dispatchers and orderless-component-matching-styles. Feel free to load this task off to me I can provide a PR later if you like.

oantolin / orderless

Intentionally selecting query method #10