scop / bash-completion

Programmable completion functions for bash
GNU General Public License v2.0
2.93k stars 380 forks source link

add mechanism to filter out known hostnames #720

Open calestyo opened 2 years ago

calestyo commented 2 years ago

Hey.

It would be nice to have a simple way to filter out known hostnames that bash-completion gathers from files like /etc/hosts or SSH’s various config files.

My use case is that I have some rather complex SSH config with special settings for many single hosts. Typically I don't use/need these hostsnames so I wouldn't want them to show up in completions as they break fast completions of those that I really use all the time, e.g. consider:

The idea would be something like a regular expression that is stored in some file (which is considered by bash-completion) and that can be used to remove any unwanted names, e.g. ^.+\.rarelyused\.example\.org$ would clear all the unwanted one.

A simple script could then add/remove that regexp and act as a switch,... so when I want to use these host names, I switch them on, otherwise off.

Such file could go into .config/bash-completion/ which may perhaps even be used for further purposes, e.g. a subdir .config/hosts.d/*.conf which contains files with further hostnames (one per line per file),... which shall be completed but do not occur elsewhere.

Cheers, Chris.

calestyo commented 2 years ago

The following is a (it seems) working proof-of-concept... main problem is that it's terribly slow.

Adding that to _known_hosts_real():

i=${#COMPREPLY[@]}
if [ -f ~/.bcexlusion ]; then
        while [ $i -ge 0 ]; do
                # `grep`’s `-q`-option is not used as it may cause an exit status of `0` even
                # when an error occurred.
                if printf '%s\n' "${COMPREPLY[$i]}" | grep -E -f ~/.bcexlusion >/dev/null 2>/dev/null ; then
                        unset -v COMPREPLY[$i]
                fi
                i="$((i-1))"
        done
fi

With a ~/.bcexlusion like:

^.+\.boring\.example\.org$
www

the above would filter out any hostnames that below .boring.example.org (RE is anchored, and the . escaped) and any hostnames containing www (RE is non-anchored).

Possible idea for extension would be to check two config files, one in /etc and one from the user. Or to make the config file(s) configurable via some XDG env vars or so.
Any ideas?

Anyway... in that form it's unusable because it's terribly slow.
I fear that's not only because of the gazillion forks of grep, but also just iterating through the array (I made a test run, where I only checked for one string to match in an if [ ...) seemed to have a noticeable penalty. But grep of course made it really bad.

Any ideas to get that fast?

calestyo commented 2 years ago

Okay I think I found something to speed this up considerably:

if [ -f ~/.bcexlusion ]; then
        COMPREPLY=( $( printf '%s\n' ${COMPREPLY[@]} | grep -E -v -f ~/.bcexlusion 2>/dev/null ) )
fi

using the assumption (is this true?) that the completions may never contain newlines.

No more looping, just printing the whole current array of completions, each hostname in one line, greping the ones to ignore and making it an array again.

Help needed:

calestyo commented 2 years ago

@scop Perhaps you could have a look at this...

For the latter I could imagine to either use e.g. something like: "${XDG_CONFIG_HOME:-$HOME/.config}/bash_completion/hostname-exclusion-patterns" but you already use.config/bash_completion` as a single file, AFAICS, and I assume you wouldn't want to change that (though I'd think it would be better if bash-completion simply had it's own dir there, as it's "domain")?

Alternatively, I could read the files to be used from an env var (e.g. BASH_COMPLETION_HOSTNAME_EXCLUSION_PATTERN_FILES) which could then be an array and be set in .config/bash_completion (or somewhere else)?
Kinda dislike this, cause ideally it should run more out of the box... and yet another var that would "spoil" the shell env ;-) The good thing about it would however be, that there are no stats whether some file exits or not, unless explicitly configured by the user.

akinomyoga commented 2 years ago

FYI, Ville's reply to the customizability: https://github.com/scop/bash-completion/discussions/539#discussioncomment-830724.

I'm not sure if we should unlimitedly add new configuration variables for these kinds of requests one by one. What if another user requests a configuration for the file that contains the list of the hostnames that should be added to completions? What if another user requests a configuration for the file that contains the list of words that should be excluded from the results of another function? There are many similar helper functions in bash-completion, and it is not realistic to add a bunch of excluding/including pattern files for every relevant helper function. Even if we would finally provide some configuration variables, I feel we should consider solving the problem in a more general way.

Actually, I guess the current proposed way to allow users such detailed control of the behavior is to overwrite the function in the user configuration file at the user's discretion. You can define your version of _known_hosts or _known_hosts_real after sourcing bash-completion. It is not at all elegant but at least it provides the maximal customizability.

If you do not want to maintain such functions, you might instead consider adding -X pattern options in the completion settings of ssh, scp and other relavant commands. Something like __load_completion ssh && eval "$(complete -p ssh) -X \"\$excluded_pattern\""

calestyo commented 2 years ago

FYI, Ville's reply to the customizability: #539 (comment).

Well.. what exactly does that say in the end? It seems his point was, accept such customisations if it makes sense?!

I could of course change my code to work without any config vars and just check each time for some pattern files, but I think that would be much more invasive (checking for those files every time a completion is made).

I guess one shouldn't just throw out new BASH_COMPLETION_* en masse, but I'd say here it makes sense.
I could probably relinquish the part that makes the RE language configurable and simply allow just ERE,... that would already drop one option.

But I guess the other two (pattern from env var + pattern list files from env var) makes sense to keep, cause it gives people the chance to completely work without pattern files (and thus without extra interactions with the filesystem).
Merging the two into one, might possible (e.g. every array element that starts with / or ~ is a file, and everything else is a pattern... but seems a bit hacky.

What if another user requests a configuration for the file that contains the list of the hostnames that should be added to completions?

I had thought about that, too, but I think it's already rather easy to get with current means, just add an empty

Host name name name

with no further rules to your .ssh/config. That doesn't change any logic (unlike when one would add it to /etc/hosts or so), but make bash-completion pick it up.

What if another user requests a configuration for the file that contains the list of words that should be excluded from the results of another function?

Well... anything might happen. But was there ever such a request or well described use case?
I don't think it makes sense to keep out features which make sense now, because of possible similar things which may never come up in reality.
And if they do, one could still think about some more generalised framework and deprecate the one specifically made for host.

Even if we would finally provide some configuration variables, I feel we should consider solving the problem in a more general way.

Any concrete proposals?
I mean what one could do, is something like a general exclusion helper function that may be called by other helper functions with an array of current completions and a namespace (e.g. the name of the calling function).
The general exclusion helper function could then select it's pattern lists based on the namespace.
But still, that sounds like taking a sledgehammer to crack a nut.

Actually, I guess the current proposed way to allow users such detailed control of the behavior is to overwrite the function in the user configuration file at the user's discretion. You can define your version of _known_hosts or _known_hosts_real after sourcing bash-completion. It is not at all elegant but at least it provides the maximal customizability.

But that's really ugly... and not really maintainable for normal users, I'd say. On every upgrade of bash-completion, people would need to update their own version.
I want something that people can actually use.

If you do not want to maintain such functions, you might instead consider adding -X pattern options in the completion settings of ssh, scp and other relavant commands. Something like __load_completion ssh && eval "$(complete -p ssh) -X \"\$excluded_pattern\""

Which would still require people do override stuff managed by the distribution... and would require them update every command that may complete hosts.
Not really maintainable from the user's side.

akinomyoga commented 2 years ago

It seems his point was, accept such customisations if it makes sense?!

Ah, yes, I think so (though that sounds like a tautology, i.e., accept acceptable customizations). I think Ville wanted to tell something more. As in the first sentence, he thinks we could add customizations only when there are no other ways. At least, we need to think hard whether there are any other solutions before adding extra customizability.

I could of course change my code to work without any config vars and just check each time for some pattern files, but I think that would be much more invasive (checking for those files every time a completion is made).

I don't think the customizability in that discussion is not just the config vars but also includes the config files such as the pattern files that you are suggesting. I mean we should consider first the solution without something like pattern files.

I'm actually not sure what would be the exact situation in your case, but if there are some clear and objective reasons that some part of the hostnames shouldn't be included in completions for general users, then we can actually just explicitly implement the filtering in _known_hosts_real. In this way, other users also take the advantage of the improved _known_hosts. But of course, this is not the case if your needs are purely coming from your personal preferences.

Merging the two into one, might possible (e.g. every array element that starts with / or ~ is a file, and everything else is a pattern... but seems a bit hacky.

I don't either think that is a good solution. Anyway, we are not discussing the number of config variables, but minimizing the resulting freedom of customizability. For example, we do not need to provide the pattern files if we provide a pattern variable and vice versa, so at least one of them is not necessary when the other one is added.

just add an empty

Host name name name

with no further rules to your .ssh/config.

This is just a hack though it works.

Well... anything might happen. But was there ever such a request or well described use case?

No, so far. But I feel that is the same as your suggestion; we have not received such a request so far, and this would be the first one. There is still a possibility that It is not something that many users want as you might think. If we need to accept everything with the first request, we need to accept every future request.

I don't think it makes sense to keep out features which make sense now,

Ville doesn't necessarily request us to reject it but tells us to think about a better solution if it is really useful.

because of possible similar things which may never come up in reality.

I just don't know how many other possible use cases that we don't come up with for now.

And if they do, one could still think about some more generalised framework and deprecate the one specifically made for host.

I think the policy of bash-completion is to think about the possibility from the beginning.

Any concrete proposals?

I think that's something we want to carefully consider. One thing that I had in my mind is to provide a general version of Bash's FIGNORE. FIGNORE only operates on the filenames in Bash, but we might think of preparing a similar configuration variable for the general words generated by the programmable completion. But I wouldn't say this idea is the best one before discussing more for even better solutions.

I mean what one could do, is something like a general exclusion helper function that may be called by other helper functions with an array of current completions and a namespace (e.g. the name of the calling function). The general exclusion helper function could then select it's pattern lists based on the namespace.

I think if we would provide such detailed customization we can add the general helper function to help implement it, but to begin with, my question is do we really need such structured namespacing for the pattern filtering from the perspective of average users? Here is a dilemma: on one hand we want to keep the user interface simple, and on the other hand we want more flexible customization. I think we need to compromise something, and what would be sacrificed can depend on the policy of the project that Ville determines.

But that's really ugly... and not really maintainable for normal users, I'd say. On every upgrade of bash-completion, people would need to update their own version.

I completely agree that it's ugly, but I don't think we need to update it so frequently as the interface doesn't basically change. Just to make it clear, what I have suggested in the previous reply is not to directly edit bash_completion but to prepare another file that contains the definition of only the functions that the user wants to replace. But anyway, this solution is the extreme one for those who want the maximal customizability.

calestyo commented 2 years ago

At least, we need to think hard whether there are any other solutions before adding extra customizability.

Well, the more generalised you want such system to have, the more need for customizability you'll get. I guess there are only two ways, either use variables to tell where to look for the patterns or look directly for the files. The latter has IMO the disadvantages of:

I don't think the customizability in that discussion is not just the config vars but also includes the config files such as the pattern files that you are suggesting.

I don't quite see how these should be a problem. Either a user wants to use such feature, then he wants the files - or he doesn't want to, then he doesn't need to have/care about them at all. It cannot get much less invasive, can it?

I mean we should consider first the solution without something like pattern files.

One cannot really drop the files. Just read the description of my use case before... the idea is that some host names are rarely used - not never. So one doesn't want to see them in day-to-day completions, but still wants a way to enable them if needed.

So the main use case is actually the one with the files,... I've provided the var-based exclusion only for those who want to filter out some names without ever using them and who don't want to read the file on every completion.

Anyway, we are not discussing the number of config variables, but minimizing the resulting freedom of customizability.

Well, it's not my project so I don't decide... but shouldn't it be the opposite what's desired? I.e. if there is some reasonable use case, giving uses the full freedom to customize it without having to effectively patch the code on every release?

And I'm not talking about making just everything configurable, e.g. it might make sense for some people to get a knob to reverse the sort order of results... that'd I'd say is fine... but giving them even more options to do that depending on the current phase of the moons would probably be too much.

For example, we do not need to provide the pattern files if we provide a pattern variable and vice versa, so at least one of them is not necessary when the other one is added.

See above... the files are necessary. If at all, then I could drop the pattern variable. But since it comes at basically zero cost, I don't see much reason why.

This is just a hack though it works.

Yes it is... but I guess we want to keep bash-completion fast. The SSH config files are anyway parsed, while any means of doing this via some extra files/vars would require reading something extra.

And the hack in ssh_config can be made a bit cleaner, by just doing a Include ~/.confg/bash-completion/add-on-hostnames and have these Host name lines there. That keeps at least ssh_config a bit more tidy.

Ville doesn't necessarily request us to reject it but tells us to think about a better solution if it is really useful.

Well... as said it might very well happen that something like this will requested by other users for further parts.
For example user names. Though in my experience, one completes far more often different hostnames, than user names... and typically there aren't so many completable hostnames on the local system.
Another case could be git... one might want to filter out certain tags or branch names. But that would then probably be per repository, and no general solution would really work.

So as said previously,... I think the only more generalised would be a helper function that selects the pattern-lists based on some namespace, which is then called by _known_hosts_real. But whether that's really needed now or ever... or whether it just complicates things, is another question.

I think that's something we want to carefully consider. One thing that I had in my mind is to provide a general version of Bash's FIGNORE. FIGNORE only operates on the filenames in Bash, but we might think of preparing a similar configuration variable for the general words generated by the programmable completion. But I wouldn't say this idea is the best one before discussing more for even better solutions.

Variables do not really work for my use case, as they're not shared amongst shell instances. I.e. I cannot call some switch command in one shell, and all other (already open) shells will get it, too. A major point of the whole idea is to not filter out these hostnames forever, but to have an easy way to get them back if one wants (without closing all shells and starting them again).

The only other thing that I can think of would be some hook functions, that are blindly called by e.g. _known_hosts_real and where users could to their stuff (like filtering). But this idea seems to be extremely stupid and far more invasive.

I think if we would provide such detailed customization we can add the general helper function to help implement it, but to begin with, my question is do we really need such structured namespacing for the pattern filtering from the perspective of average users? Here is a dilemma: on one hand we want to keep the user interface simple, and on the other hand we want more flexible customization. I think we need to compromise something, and what would be sacrificed can depend on the policy of the project that Ville determines.

Well, as said, I don't see the necessity right now for such functionality beyond hostnames. Though I cannot promise that this won't change in ten years.
But even then, it's IMO unclear whether such needs could be suited with any generalised framework. It may not always be patterns that one wants to filter out.

I'd also say that any customisation knobs don't make the interface more complex for the average user, when they don't change anything when unused (and when well-documented). bash-completion should work out of the box for everyone, yet be configurable (within reasonable means) for those who need more customised solutions.

And from the developers PoV:

I think we need to compromise something, and what would be sacrificed can depend on the policy of the project that Ville determines.

Well sure... in the end it's his project and he must decide whether he wants it or not... or whether he has some other proposals.

I found it simply quite useful and pretty non-invasive to both the code and the user interface - so I'm actually quite surprised that this opened so many questions.

I completely agree that it's ugly, but I don't think we need to update it so frequently as the interface doesn't basically change.

Not sure whether I understand you correctly. I though the idea was, that the user overwrites e.g. _known_hosts_real, right? Sure it's interface may change rarely, but the actual code? Any change in the way SSH files are parsed? Any added sources for hostnames?

The user would always need to check and merge.

akinomyoga commented 2 years ago

After all, @scop will decide what should enter in this project and what not. This is just based on my guess about the policy of this project from my experience, but I feel the direction of this PR seems to be slightly different from the direction of bash-completion.

Well, the more generalised you want such system to have, the more need for customizability you'll get.

Hmm, maybe I miss your point, but I feel I need to clarify that I'm not promoting some generalized complicated mechanism to add much customizability. It's actually opposite. I think the policy of bash-completion is to keep the mechanism simple and the customizability minimum. In general, when we receive some suggestions, we first ask ourselves whether we should add the feature or not, and then think about a customization interface that is as simple as possible from the user's perspective.

I guess there are only two ways, either use variables to tell where to look for the patterns or look directly for the files. The latter has IMO the disadvantages of:

  • this needs to be done on every completion, even if the user doesn't even want to use the feature at all
  • requires hard-coded location of where these pattern files should be

I don't either think the latter is a good solution. When we would finally support pattern files, that should be specified by some config variables. When I say "minimizing the customizability", I don't just mean reducing the number of configuration variables, but the effective freedom of the customizability. Here, reducing the number of configuration variables by fixing the path of the pattern files does not actually change the range of what can be customized, so that's not what I'm suggesting.

I don't think the customizability in that discussion is not just the config vars but also includes the config files such as the pattern files that you are suggesting.

I don't quite see how these should be a problem. Either a user wants to use such feature, then he wants the files - or he doesn't want to, then he doesn't need to have/care about them at all. It cannot get much less invasive, can it?

I think there are two separate discussions here. One is whether we provide the freedom of configuring the excluded hostnames, and the other is how we provide such customizability if we decided to provide the customizability. In that part, I was not discussing other solutions to provide the customizability but the solutions without the customizability (though it turned out that it doesn't seem to apply to the present case).

One cannot really drop the files. Just read the description of my use case before... the idea is that some host names are rarely used - not never. So one doesn't want to see them in day-to-day completions, but still wants a way to enable them if needed.

  • Which hostnames these are is obviously completely dependant on the respective user, therefore it makes no sense to implement anything that drops e.g. all with www. for every user in a hardcoded way.
  • Storing the exclusion patterns just in a variable, without any files, has the major drawback that one could enable the hosts only per shell (e.g. by unsetting that var), but what's actually desired is something like a switch (e.g. renaming the respective .config/bash-completion/host-exclusion-patterns.d/group.conf to .config/bash-completion/host-exclusion-patterns.d/group.conf.disabled. Can be done via a simple shell script that's added to ~/bin or so, that may even be extended with completion support if one has several groups.).

Thank you for explaining the details on your use case. Hmm, unfortunately, after reading it, I now feel that this use case is too specific. I don't think there many users that want to change the list of hostnames dynamically and globally in the system at the same time so frequently. For example, we can easily think about slightly different use case where the user wants to manage the list of excluded hostnames in each Bash session separately where the pattern-file approach is not so useful. To me, the above arguments appears to only exactly matches with your specific use case.

I still think just providing a pattern variable would be simpler and covers many cases even though I understand that that doesn't satisfy your needs. The upstream bash-completion is just a common configuration for general users, but that doesn't stop you using the locally patched version of bash-completion that suits your needs.

Well, it's not my project so I don't decide... but shouldn't it be the opposite what's desired? I.e. if there is some reasonable use case, giving uses the full freedom to customize it without having to effectively patch the code on every release?

And I'm not talking about making just everything configurable, e.g. it might make sense for some people to get a knob to reverse the sort order of results... that'd I'd say is fine... but giving them even more options to do that depending on the current phase of the moons would probably be too much.

From microscopic point of view, adding as many configuration options as possible would be better than not providing them in theory because the user can just keep it default if the user is not interested in them. But the microscopic optimization doesn't necessarily mean the macroscopic improvements. I have seen many arguments about the harm of adding arbitrary customizability that makes sense (excluding the moon-phase option, etc.) even though the customizability is usually useful and powerful for the experience of heavy users. For example, minimizing the interface is the key of encapsulation or making loosely-coupled and maintainable system. Providing too much customizability puts stronger constraints in the future changes of internal implementations (unless we happily drop the backward compatibilities). Also the customizability could lead to dependency hell, [e.g. consider the case external modules B and C assume different configurations X and Y of customizable module A, respectively, etc.]. The larger freedom of the customizability makes it harder to test the code for all the possible cases of the combination of the configuration. The larger freedom of the customizability will confuse average users even if it is fully documented; the users rarely read and understand the details of the entire document of the project from top to the bottom before they start to use it. If the customizability is larger, the user need to take more time to understand what would be the proper way of customizing it to satisfy their needs. My basic stance on that kind of arguments is "that depends"; whether it applies or not is case-by-case. I don't think these are always perfectly correct, but likewise, I don't think accepting every customizability that would make sense is a good idea. Even if we pick up some of these changes, we need to be careful.

And from the developers PoV:

  • A generalised framework would likely be more complex.

As I have written above, I'm not promoting a generalized framework. It's actually opposite.

  • bash-completion already ships numerous completions for which one could ask whether they're of any use for the general user (freeciv, hddtemp (which I thought was kinda dead) ... so you can always bring up the question for how many people something is actually useful or not.

These are not related to customizability.

  • The current proposal is IMO quite isolated and simple. And even if it should ever be replaced with something more general, that would also be based on pattern list files (I mean that's the core of the idea), so the current interface should then be easy to deprecate and replace by the new more general one.

Backward compatibility is considered important in this project.

I found it simply quite useful and pretty non-invasive to both the code and the user interface - so I'm actually quite surprised that this opened so many questions.

I think my above arguments on minimizing customizability answers a part of the reasons.

Not sure whether I understand you correctly. I though the idea was, that the user overwrites e.g. _known_hosts_real, right?

Right.

Sure it's interface may change rarely, but the actual code? Any change in the way SSH files are parsed? Any added sources for hostnames?

The user would always need to check and merge.

The user only needs to check the updates if they want to keep track of the latest features and fixes of _known_hosts/_known_hosts_real. Otherwise, the user can just continue to use the user's version without problems as far as the interface of the function doesn't change. Also, I don't think _known_hosts and _known_hosts_real are so frequently changed.

calestyo commented 2 years ago

After all, @scop will decide what should enter in this project and what not. This is just based on my guess about the policy of this project from my experience, but I feel the direction of this PR seems to be slightly different from the direction of bash-completion.

Sure,... I'd find it a pity if such nice functionality wasn't accepted but life goes on.

I think the policy of bash-completion is to keep the mechanism simple and the customizability minimum. In general, when we receive some suggestions, we first ask ourselves whether we should add the feature or not, and then think about a customization interface that is as simple as possible from the user's perspective.

I guess it's always a trade off. As I've tried to explain before, I agree that it can be overkill to make basically everything configurable. If I'd have suggest to make the (selected) patterns dependant on e.g. a previously completed username... or maybe by some auto-discovered currently connected network (think about: only showing intranet hostnames, when one has VPN running or so)... than I'd have agreed that this may be overkill.

For my personal taste at least, what I propose is simply non-invasive and yet still quite powerful (e.g. one could implement the connected-network dependent completing outside of bash, by some hooks that add/remove patterns based on e.g. an event by NetworkManager.. or whatever).

But surely it's a matter of personal taste.

I don't just mean reducing the number of configuration variables, but the effective freedom of the customizability.

Well that "policy" I couldn't really follow. I can understand if one doesn't give freedom of customizability when the feature is too complex to maintain... or when one cannot identify any reasonable use case... or when the implementation of it would have too negative consequences even for those who do not even want to use it (e.g. see my first implementation try in the beginning of this ticket, which had really terrible performance).
But just removing customizability for the sake of it... how would that make sense in any way?
That would sound like what many people felt became GNOME philosophy (removing features and selling it as an advantage)... and why things like Cinnamon and friends came into existence.

I now feel that this use case is too specific. I don't think there many users that want to change the list of hostnames dynamically and globally in the system at the same time so frequently.

Uhm, I guess that's difficult to judge without any proper survey or so. I work in research, we're participating in a number of different project... each of them having completely different hosts I need to connect. Typically I don't do this wildly intermixed but at one time do stuff for one project. Similarly, people may have different hosts depending on whether they're do some private stuff... or work (for the job). All our students, for example, get access to a large pool of workstations (physically and remotely),... so in principle, they'd need a SSH key for each of those nodes, whilst typically not connecting to all of them. Other may simply want to filter out any IPv4 or v6 addresses.
Seem all like not so uncommon cases of where one wants to exclude certain hosts from completions.

For example, we can easily think about slightly different use case where the user wants to manage the list of excluded hostnames in each Bash session separately where the pattern-file approach is not so useful.

Why? It would still work. Just that one would need to set BASH_COMPLETION_HOST_EXCLUSION_PATTERN_FILES to a different value per shell, which can be quite easily controlled via .profile or .bash* configs. Or the same via BASH_COMPLETION_HOST_EXCLUSION_PATTERN. If one wants to use the former one could simply set up another level in the .config/bash-completion/host-exclusion-patterns.d/<shell-id>, where shell-id could be any group of terminals or even single terminals running a shell.

I still think just providing a pattern variable would be simpler and covers many cases even though I understand that that doesn't satisfy your needs.

Still don't see much advantage of removing the file functionality. If it would be super complex code or so, than sure... but it seems rather simple, so other than saving a line of code and a variable that gets never even set when people don't use the feature... what for?

Of course one could also hack the same functionality in to the system with just having a single pattern variable. E.g. via bash’s PROMPT_COMMAND, which every time reads the same pattern files that would be now in BASH_COMPLETION_HOST_EXCLUSION_PATTERN_FILES and feeds the result into the single pattern variable.
Sure, works, but makes usage for the end-user just more nuisance, that could would need to be written by the user, it would possibly need to go into both, .profile and .basjrc... and worst of course, it would read the configuration really on every new readline... not just on every completion, quite a performance issue.

From microscopic point of view...

I guess it probably doesn't make much sense to go down the philosophical rabbit hole ;-) One can surely find arguments for both extremes... some may e.g. the simple POSIX interface is good enough... others might say that Linux kernel has shown it's simply too limited for the real world.

Two things however:

unless we happily drop the backward compatibilities

These are not related to customizability.

Okay than... other example. The current data sources of _known_hosts_real() ... okay SSH obviously makes sense... I can also understand Avahi (though I don't personally like the whole idea behind it)... but ruptime ... didn't even know what that was before I stumbled over it ;-)

I just wanted to say... it's always quite some personal PoV, what gives a big use case and what not.

The user only needs to check the updates if they want to keep track of the latest features and fixes of _known_hosts/_known_hosts_real.

Uhm... sure, yes... but...isn't that what more or less everyone would want unless for some very strong/special reasons?

akinomyoga commented 2 years ago

Sure,... I'd find it a pity if such nice functionality wasn't accepted but life goes on.

Ah, wait. Maybe I have confused you, but the suggested functionality has not yet been rejected by @scop. He's just busy. Next, I have written that I feel the direction is slightly different, but I'm not trying to reject the entire idea, but I thought the PR can be adjusted to better suit the direction of bash-completion to make it accepted (although the resulting changes might not exactly match with your specific case).

For my personal taste at least, what I propose is simply non-invasive and yet still quite powerful

I agree with this point, but this is a microscopic fact (in the sense that I wrote in the previous reply), and that might not necessarily be the sufficient reason that it would be merged. Even if each of the possible microscopical improvements makes sense for some use cases, adding every such feature into one might not be finally so useful. That depends on the project and the maintainer.

From microscopic point of view...

I guess it probably doesn't make much sense to go down the philosophical rabbit hole ;-)

I think this is related to the central point of the policy explained in https://github.com/scop/bash-completion/discussions/539#discussioncomment-830724, so if you avoid discussing this point, actually I don't have something important to discuss here.

unless we happily drop the backward compatibilities

  • This is bash-completion... it's not the kernel were gazillions things break when your remove some interface.

The user configurations depend on the interface. External completion scripts depend on the interface (though external completion scripts are irrelevant for the present case). We might write a disclaimer for the breaking changes in release notes to avoid complaints as you say, but that doesn't mean there is zero stress on the users. I wouldn't say making breaking changes is 100% wrong, but also I don't think not making breaking changes is 100% wrong. The maintainer can choose.

Either it's anyway used by only few people - or it's used by many (then bash-completion will likely want to keep it anyway ;-) ).

The latter case is the problem.

  • No new dependencies are added by my code.

Although that part about the dependency is rather a general argument, the customizability introducer (i.e., your PR) corresponds to A in that explanation but not B and C at the side that adds the dependencies.

  • And just to boringly repeat myself... the code is pretty simply and isolated. I doubt it makes testing or future development any harder.

Right, but that's microscopic. The suggested change as alone wouldn't make any problem, but it's unpredictable when we continue to accept every similar PRs just because they microscopically make sense. It's also related to the amount of resources that the maintainer can provide.


I think the current version is already good enough in a microscopic sense. The suggestion of only providing the pattern variable (for general completed words not only for hostnames) is just a naive idea on how to focus on the necessary part of the suggestion while compromising some part of the usefulness, but that might not be something that @scop requires us by the words "if everything else fails, then make it customizable/configurable".

scop commented 2 years ago

There's way too much to read through and completely digest here that I have time for right now, but from the bits I was able to read, I think @akinomyoga is representing my general mindset well. The moments I do find for bash-completion I'll direct towards completing https://github.com/scop/bash-completion/projects/2 (and I haven't found those moments for some time either), pretty much everything else that requires more than a couple of minutes of my time is on hold until that's done as far as I'm concerned, sorry about that.

Take this with a kilogram of salt as it's quite possible I've missed something, but what I was able to gather from here leaves me with a feeling that this is too much complexity for such a specific use case. The ability to switch between BRE/ERE/PCRE most definitely is, but I'm not making any promises I'd accept this even with it removed. Nor unfortunately when I'll have time to revisit here if you still wish to continue to pursue inclusion of this.

scop commented 2 years ago

Just another couple of quick notes:

About the specifity, I have come across cases where I would have liked to filter out some known hosts completions. Those revolved around not getting completions that I know will not work with some commands, but are still useful with others. This implementation would not have helped, because the exclusion in my case would require more context than just a global filter list.

I think a generic and more powerful exclusion mechanism would have a better chance to be accepted than a specific one like this. Yes, it would almost certainly make it harder to use than for example this. But still. (And no promises about that either, except willing to have a look sometime if someone comes up with it.)

calestyo commented 2 years ago

The ability to switch between BRE/ERE/PCRE most definitely is, but I'm not making any promises I'd accept this even with it removed.

I had already put it in a separate commit to be easily skipped. ^^

Anyway... it's a bit strange that one one side you consider the code too complex (whereas it's really just a few lines, that absolutely don't touch anything else or have any side effects, than removing some names).... while on the other hand you wish for some far more generic/powerful exclusion mechanism that takes (arbitrary?) context into account.
Seems a bit contractionary.

Well... I guess it's rather clear that I cannot really invest any time into such more generic method... apart from it being to unspecific it feels that if even this little contribution cannot be accepted, anything bigger is even less likely to ever be and so such contribution would be just another waste of time.

Not sure whether you want to keep the PR here open for anyone who wants it and would then manually patch, cause I've already removed my forked repo, no sure if the code goes away once this is closed here.

Thanks however for looking into this, cheers!

scop commented 2 years ago

It's not "really just a few lines" -- it's 25 lines of code, 3 configuration variables, 60 lines of related documentation. That's just too much for a narrow scoped thing like this in my opinion.

Whereas a generic exclusion mechanism wouldn't necessarily be that hard to implement or use, just some food for thought (which I suggest we discuss in another issue or PR if we want to pursue this further), quite likely needs more work and doesn't account for all things we'd need it to yet:

$ git diff -U2
diff --git a/bash_completion b/bash_completion
index 1d57de3a..7c685f42 100644
--- a/bash_completion
+++ b/bash_completion
@@ -820,4 +820,12 @@ _comp_variable_assignments()
 }

+_comp_return_hook()
+{
+    ((${#FUNCNAME[*]} != 2)) && return  # this _will_ need some refinement and thought
+    echo "Hello from return hook for ${FUNCNAME[1]}"
+    echo "words: ${words[@]}"
+    echo "COMPREPLY: ${COMPREPLY[@]}"
+}
+
 # Initialize completion and deal with various general things: do file
 # and variable completion where appropriate, and adjust prev, words,
@@ -839,4 +847,6 @@ _init_completion()
     local exclude="" flag outx errx inx OPTIND=1

+    trap _comp_return_hook RETURN
+
     while getopts "n:e:o:i:s" flag "$@"; do
         case $flag in
$ cd helpers
$ foo bar --quux baz <TAB>Hello from return hook for _minimal
words: foo bar --quux baz 
COMPREPLY: Makefile.am perl python Makefile.in Makefile

Makefile     Makefile.am  Makefile.in  perl         python       

Now, we could have an associative array from where users could place their hook functions, keys being for example function or command names, and _comp_return_hook would look them up and run if they exist. That function would have access to COMPREPLY as well as all the context there is (some things such as the words array would need to be standardized probably for this), and could do things, filter from COMPREPLY, add to it, or do something completely different, whatever it likes. The use case at hand could be solved by something like:

_comp_return_hooks[ssh]=_my_ssh_return_hook

_my_ssh_return_hook() {
    for i in "${!COMPREPLY[@]}"; do
          [[ ${COMPREPLY[i]} =~ ^exclusion_regex$ ]] && unset -v 'COMPREPLY[i]'
    done
}
calestyo commented 2 years ago

It's not "really just a few lines" -- it's 25 lines of code, 3 configuration variables, 60 lines of related documentation.

Well documentation barely counts as complexity... and if you just take the main commit, remove comment and whitespace lines, than it's 6 lines, two of them closing fis. The changes to bash_completion.sh can also barely counted as adding any complexity at all.

Not that I'd had anything against your proposal with a hook, but that does sound more complex and invasive. If you then add documentation on how it's used... wonder how many lines that would give.
And then you only have the general framework... which would make my life a bit easier, as I wouldn't have to patch the file on any upgrade... but the average user would then still miss the actual functionality and invent it on his own. Unless of course you include an example filter, too, but then you're at even more lines.

Nevertheless.. would be nice if you could integrate such hook framework... I could use it then with my filtering code and avoid the need to patch the file on every upgrade.

Cheers.

calestyo commented 2 years ago

What you could perhaps do, beyond the general framework, was adding a examples/ dir to the documentation, were things like the "actual" code, e.g. such a filtering function based on patterns, could be collected (including the respective documentation). Such directory and its files could be organised in a way that users could just source the "examples" in e.g. their /etc/profile.d/bash_completion.sh... e.g. along with the code that sets up the function-specific config files (i.e. what I did now directly in /etc/profile.d/bash_completion.sh).

Then users would get rather easy access to that (and other use-cases) without having to re-invent the wheel, and you could still keep your main code base "clean". And all would be happy.
Then it should be even easier to accept things like selecting the RE language,... as it wouldn't affect anything unless a user specifically wants to use that "example".

So if your proposal makes it into the code, and you tell me how you'd want such "examples" to be part of the package, then I'd be happy to adapt mine to fit to that.

calestyo commented 2 years ago

Just one more thing that comes to my mind.

I remembered @akinomyoga’s idea about the opposite to my use-case (adding hostnames).

It would would be nice if it was possible to specify more than one callback function per name (like ssh). Perhaps the associative array’s elements could themselves be indexed arrays, with the functions listed there called in that particular order?

calestyo commented 2 years ago

And if you reeeeaally want it generalised:

Why not changing _comp_return_hook to _comp_hook and make even the point in which the hook is called a value?

So _known_hosts_real() could get e.g. two points in which the hook is called start and return (not necessarily saying, that start would make sense for _known_hosts_real()).
That would give full flexibility in the future, and the respective points could be made specific to the respective function in which the hook is called.

Perhaps the hook should then get the point at which it was executed (like (on) return) and the function for which it was executed as (positional) parameters?

calestyo commented 2 years ago

It just came to my mind, that if such framework would make the hook dependent on completed command for which it was invoked by a function (like _known_hosts_real()), as indicated by the ssh in :

_comp_return_hooks[ssh]=_my_ssh_return_hook

then there should be some way to specify a wildcard... because wouldn't want to set the above for any possible command that does hostname completions.

Not sure what should happen, when there's both, a specific entry in _comp_return_hooks for a command and the wildcard. Or one would have two things:

calestyo commented 2 years ago

Just noted one further thing, that would be specific to an example exclusion hook function when that is to be used with scp .. there it would need to exclude ":", because scp completes to ":".

So that also shows, that it would be nice for the hook function to get some information (like who's calling, and in which stage) as positional parameter.

akinomyoga commented 2 years ago

I think the proposed _comp_return_hook is an interesting and powerful idea that can be used for general customization although I tend to avoid traps as traps have some non-trivial behavior (or bugs) in Bash. There are already several things that we need to care about when we use traps. I think I can take a shot at implementing the proposed idea.

akinomyoga commented 2 years ago

Well documentation barely counts as complexity... and if you just take the main commit, remove comment and whitespace lines, than it's 6 lines, two of them closing fis. The changes to bash_completion.sh can also barely counted as adding any complexity at all.

I'm not sure how you define the complexity, but if we would accept, say, one hundred similar suggestions with 25 lines of codes, 3 config variables, and 60 lines of well-documented descriptions in the manual, it would finally lead to 2500 lines of additional codes, 300 configuration variables, and 6000 lines of documentation.

Not that I'd had anything against your proposal with a hook, but that does sound more complex and invasive. If you then add documentation on how it's used... wonder how many lines that would give.

The proposed mechanism doesn't have such a small scope as the exclusion list for _known_hosts but is capable of dealing with many similar cases.

And then you only have the general framework... which would make my life a bit easier, as I wouldn't have to patch the file on any upgrade... but the average user would then still miss the actual functionality and invent it on his own. Unless of course you include an example filter, too, but then you're at even more lines.

In that sense, I think adding your specific feature could be also a valid option, but that's just not the option bash-completion takes. By the way, I'd say more than 90% of users will not change the default configuration at all., and 99% users are not interested in adjusting the detailed list of the hostnames in the completion even if there is a way to configure it. So, actually you are an outlier in a good way, i.e., far better than the average users. Taking the current decision of bash-completion, I'd guess or suggest that such capable users write their own codes to filter the generated candidates using the proposed hook.

Perhaps the associative array’s elements could themselves be indexed arrays,

That's not possible in Bash (although I think ksh has certain support for such nested object structures). I don't think we should support it from the beginning as the hook is supposed to be primarily set by the user but not by arbitrary frameworks, and the users can manage the several processing by themselves. But maybe we can later consider adding the support for multiple hook functions.

And if you reeeeaally want it generalised: [...]

As you know, there aren't any needs for that currently at all.

then there should be some way to specify a wildcard... because wouldn't want to set the above for any possible command that does hostname completions.

Not sure what should happen, when there's both, a specific entry in _comp_return_hooks for a command and the wildcard. Or one would have two things:

  • a wildcard entry, which would be used in addition to the specific one
  • a default entry, which would only be used if no specific and/or wildcard entry is set

We explicitly specify every command using complete because Bash complete doesn't support the wildcard. In that sense, supporting wildcards only for _comp_return_hook seems like an unbalanced solution. I'd suggest preparing _comp_return_hook_default as an array so that multiple handlers for different command patterns can be registered incrementally. Something that can be used as

_my_return_hook_for_git_subcommands() {
  [[ ${COMP_WORDS[0]} == git-* ]] || return 0 # selection by wildcard
  # ...
}
_comp_return_hook_default+=(_my_return_hook_for_git_subcommands)

Just noted one further thing, that would be specific to an example exclusion hook function when that is to be used with scp .. there it would need to exclude ":", because scp completes to ":".

Confirmed. I haven't thoroughly checked why : is generated, but isn't that just a bug of scp completion? Or do you think it should be configurable by the users to enable/disable the generation of :?

So that also shows, that it would be nice for the hook function to get some information (like who's calling, and in which stage) as positional parameter.

Is that information something different from the one carried by FUNCNAME or COMP_WORDS?

calestyo commented 2 years ago

I think the proposed _comp_return_hook is an interesting and powerful idea that can be used for general customization although I tend to avoid traps as traps have some non-trivial behavior (or bugs) in Bash. There are already several things that we need to care about when we use traps. I think I can take a shot at implementing the proposed idea.

I you'd go for the even more generalised idea that I've described above (where the point in which a hook is called in a given function, could also be made configurable (when that makes sense for a function)), then trap wouldn't work anyway, cause you can specify only RETURN and some others.

One could simply call the hook-handler like:

_comp_user_hook ${FUNCNAME[0]} "return"

e.g. in _known_hosts_real() just before the return

And that could look at _comp_return_hooks, and call the functions in there (with setting position parameters)... and depending on complex/generalised you want it to be, with the wildcard/default mechanism I proposed above.

akinomyoga commented 2 years ago

One could simply call the hook-handler like:

_comp_user_hook ${FUNCNAME[0]} "return"

e.g. in _known_hosts_real() just before the return

Are you suggesting adding the above line at the end and the beginning (with return being replaced by start) of all the ~70 functions of bash_completion?

Actually, if you want to insert some codes at the beginning and the ending of an existing function, you can do it in the following way without any support from bash-completion:

eval "__my_original_func__$(set +o posix; declare -f funcname)"
funcname() {
  echo "do whatever you like for 'start' hook"
  __my_original_func__funcname "$@"; local status=$?
  echo "do whatever you like for 'return' hook"
  return "$status"
}
calestyo commented 2 years ago

I'm not sure how you define the complexity, but if we would accept, say, one hundred similar suggestions with 60 lines of well-documented descriptions in the manual, it would finally lead to 2500 lines of additional codes, 300 configuration variables, and 6000 lines of documentation.

Sure, question is though, how much such suggestions will pop up in reality. Anyway... I think we can just end that philosophical discussion... I kinda like your proposed mechanism with hooks much more... at least if you'd also accept example scripts for them to be part of bash-completion's distribution, otherwise every user would have to re-invent the final part of the wheel.... I think by handling them as example you could keep the main code base clean and still provide working solutions to end-users.

That's not possible in Bash (although I think ksh has certain support for such nested object structures). I don't think we should support it from the beginning as the hook is supposed to be primarily set by the user but not by arbitrary frameworks.

Hmm.. couldn't one workaround that by some naming convention? Like on key for the associate array, of the form function-phase. - cannot be part of function names, so that shouldn't cause any confusions... and it would still be extensible, should that ever be needed.

The value could be , separated user-hook functions, which we split via IFS and call each one by one (if there are more than one).

As you know, there aren't any needs for that currently at all.

I guess so (at least I cannot think of any case right now, where e.g. a hook at the start of some function would be necessary), but it's better to keep it in mind in the design phase right now, and not to worry later, should it ever become necessary.

At least if you call it _comp_return_hooks you defined that forever to be only for just the return point.

We explicitly specify every command using complete because Bash complete doesn't support the wildcard. In that sense, supporting wildcards only for _comp_return_hook seems like an unbalanced solution. I'd suggest preparing _comp_return_hook_default as an array so that multiple handlers for different command patterns can be registered incrementally. Something that can be used as

Not sure whether I understand you correctly... maybe we think about the whole system a bit different. Why are you using complete? I thought you'd have a genera hook handler _comp_return_hook, which you call e.g. just before the end of _known_hosts_real() (and in other functions for which that makes sense) ... and that calls the user supplied functions configured in the array _comp_return_hooks?

The following design question just came to my mind:
When I write a hook, and configure it in the array _comp_return_hooks... which things do I want to use to select when my hook should run?

  1. I've used the function so far (e.g. _known_hosts_real())
  2. you used the command (e.g. ssh, scp)
  3. and I've further proposed the "phase" (like on return) (but okay, one could skip that generalisation if you don't like it)

But I guess one really wants both (1) and (2). E.g. I wouldn't want to maintain a list of all possible commands that use _known_hosts_real() and add an entry for them in _comp_return_hooks... yet it would be nice if could configure e.g. different exclusions for ssh/scp vs. e.g. ping (I actually do have a use case for that ;-) ).

I think one could allow this via two ways:

  1. function-command-phase (or without phase) and the wildcard (e.g. with *`` and default (e.g. with;` or some other symbol that cannot be a command) entries
  2. from within the user hook function (simply by giving these fields as positional parameters to the user hook function (and it can do whatever it wants).

So with (1) one could have (without a "phase"):

_comp_return_hooks[_known_hosts_real-ssh]=_my_ssh_return_hook

which is called whenever the function is _known_hosts_real and the command ssh or

_comp_return_hooks[_known_hosts_real-*]=_my_otherssh_return_hook

whenever the function is _known_hosts_real for any command... which would run even if there's also a _comp_return_hooks[_known_hosts_real-ssh], whereas a _comp_return_hooks[_known_hosts_real-;] (default) would not run then.

Mhh well maybe a bit complex... I guess I'd need to think that through a bit more thoroughly.

At least I wouldn't want to have to explicitly set my user hook for any command, that uses host completions (which implies selection by the function)... and it would still be nice to choose something different based on the command (which would be selection by the command).

Confirmed. I haven't thoroughly checked why : is generated, but isn't that just a bug of scp completion? Or do you think it should be configurable by the users to enable/disable the generation of :?

Joking? :-P Well possibly there could be some users which would want it configurable. If not, that I think I would want to have the :,... the reason I guess is, that if a hostname completes, than likely I don't mean a file of the very same name,... however, without the trailing : SSH would use that as local file (source, destination).

So I guess in the majority of all cases, the host (thus with :) is wanted... but some users may have many files matching hostnames and might want to complete these.

Is that information something different from the one carried by FUNCNAME or COMP_WORDS?

Hmm.. you're right... had forgotten that FUCNAME is an array with all the the others in the call stack. At least we'd then need to promise that e.g. FUCNAME[1] is always the "actual" function.

calestyo commented 2 years ago

Are you suggesting adding the above line at the end and the beginning (with return being replaced by start) of all the ~70 functions of bash_completion?

No, I'd only add such calls there, where there's some reasonable demand/use-case for it.

As long as I'm the only one asking for such thing with the hostnames... add it only there and only in the end (cause that's enough for me). If someone comes up with a nice use-case for e.g. git... add it there (though git completion is anyway part of upstream git, isn't it?).

The idea with the "phase" would have just been for the case, that it should be absolutely future-proof, e.g. if someone comes up with an additional valid use case, that needs a callback in e.g. the beginning of some function or somewhere else.

Actually, if you want to insert some codes at the beginning and the ending of an existing function, you can do it in the following way without any support from bash-completion:

Interesting... looks scary (the eval ^^) .. but still a pretty cool solution. Though I'd prefer @scop’s approach with callbacks... seems IMO "cleaner".

akinomyoga commented 2 years ago

Hmm.. couldn't one workaround that by some naming convention? Like on key for the associate array, of the form function-phase. - cannot be part of function names, so that shouldn't cause any confusions... and it would still be extensible, should that ever be needed.

Yeah, there are always many ways of workarounds. If I would add the support in the future, I'd do it in the following way:

declare -gA _comp_return_hook_index=()
_comp_return_hook_count=0
_comp_add_return_hook() {
  [[ ${_comp_return_hook_index[$1]+set} ]] ||
    _comp_return_hook_index[$1]=$((_comp_return_hook_count++))
  eval "_comp_return_hook_${_comp_return_hook_index[$1]}+=(\"\$2\")"
}

The value could be , separated user-hook functions, which we split via IFS and call each one by one (if there are more than one).

, is a valid character that can be contained in function names. Rather, a space should be used to separate the function names. But I actually want to allow arbitrary shell commands to put in the hooks like PROMPT_COMMAND where any character cannot be used as a separator.

Not sure whether I understand you correctly... maybe we think about the whole system a bit different. Why are you using complete?

I'm not going to use complete for the hook system. I'm just referencing the design of complete. Currently, when we set up the programmable completion, Bash requires us to set up complete for each command one by one. Of course, we can support wildcards independent of the Bash's programmable completion system, but that feels like unbalanced improvements.

I thought you'd have a genera hook handler _comp_return_hook, which you call e.g. just before the end of _known_hosts_real()

No, see @scop's suggestion. I'm just going to implement the idea with additions of careful handling of RETURN traps and other special care for trap handlers.

When I write a hook, and configure it in the array _comp_return_hooks... which things do I want to use to select when my hook should run?

  1. I've used the function so far (e.g. _known_hosts_real())
  2. you used the command (e.g. ssh, scp)
  3. and I've further proposed the "phase" (like on return) (but okay, one could skip that generalisation if you don't like it)

I am going to support only 2 even though I know that that doesn't exactly satisfy your specific needs. As I have suggested in my first reply, I guess you are finally going to overwrite _known_hosts without relying on a new bash_completion support. After you feedback and discussion, I think we are now on the track of adding a general mechanism that can be used for many cases but not every case, which wouldn't finally include your use case.

I think one could allow this via two ways:

What percentage of the users would need that mechanism? The interface is overcomplicated even for your specific use case. This seems like a typical counterexample of the KISS principle. Even with that supported, I'm not sure if that would match with future demands that require the detailed control of the same level. Also, if we need that level of control, it is more clear to write the code than inventing special variables and special syntaxes/sublanguages "phase-command", "utility-command", etc. just for the configuration.

Joking? :-P [...] I think I would want to have the :,...

What? ... then I guess I misunderstood your original comment (unless you actually want a single character word : without any hostname). I thought you were talking about the following candidate ":":

image

With the latest master of bash-completion, scp completion generates a single character word : without any hostname, which doesn't have any meaning in scp (but just interpreted as a local file name), which turned out to be just a regression introduced by 08dd2cdb3. I will submit a fix soon. (Edit: opened PR #737).

No, I'd only add such calls there, where there's some reasonable demand/use-case for it.

We cannot predict whether there would appear some use cases for each function. Here, I guess you will suggest adding the call one by one every time someone came up with a use case, but we wouldn't like symptomatic solutions but a definitive solution.

Interesting... looks scary (the eval ^^)

That's just apparent.

Though I'd prefer @scop’s approach with callbacks... seems IMO "cleaner".

@scop isn't suggesting the callbacks for arbitrary functions that you imagine. You can again carefully look at the code in the original @scop's comment.