Closed 44e76c4e-f397-45e7-8e63-ae88bc35aef5 closed 11 years ago
I would like to propose a small addition to the standard Python distribution. I had in mind two additional functions for the os module, but where they fit is subject to debate. They are:
which (command[, path]): Finds executable files with the given name in the standard operating system path (os.environ['PATH']) or the user-specified path (which can be a string using the os.pathsep separator, or a Python list of directories in which to search).
Availability: UNIX and Windows, at least. I don't know enough about the relevant details of Mac to know whether this is relevant. The attached implementation uses os.access, which is only available on UNIX and Windows.
Rationale: It is often useful to check for the existence of a particular program. os.system and os.execp use or mimic the search-in-path functionality, but are often unsuitable for this purpose, because they require running the program right now. Suppose you want to see what's appropriate -- fsh (fast version of ssh), ssh, or rsh (ick) -- and use that decision later on? In the particular case I have in mind, I want to be able to configure my script to say that the sound_player is either esd if there is an executable program with that name in the path, or else none. With which, I can say
if which ('esd'):
sound_player = 'esd' # or which ('esd') [0]
else:
sound_player = None
Another common use (for me at least) would be for scripts that replace other programs in path, but still need to run the programs they replace. (E.g., I have a script called 'ssh' that resets an xterm's title once the real 'ssh' completes.) This could be done as follows:
def find_alternate (program, not_this):
for alt in which (program):
if not os.path.samefile (alt, not_this):
return alt
return None / raise ...
Counterargument: which is a one-liner (see the attached implementation)
Response: In my opinion, it is a longish one-liner, and is only obvious to people who know functional programming (map, reduce, and filter). And in my opinion it is a sufficiently common scripting operation that it warrents a helper function. It is of course a minor feature, but os is filled with many, very helpful, helper routines, and I think this is a similarly useful one. (But this is of course subject to argument.)
I made a small inquiry on irc.openprojects.net:#debian about whether this sounded useful, and two people voted yes, and no one voted no.
Since the functionality of which is closely matched by os.access, I further propose that which can be specified a different mode than "executable file" by an optional argument. That is included in the attached implementation.
find_in_path (command[, path]): Finds all files/directories with the given name in the standard operating system path or the user-specified path (specified as in which).
Availability: Everywhere (using os.path.exists instead of os.access)
Rationale: This is a natural generalization of which; indeed, which is seen most naturally as a filter of the result of this function. In fact, that is how I implemented which in the attached implementation.
More relevantly, this function allows you to find the font with a given name in your font directory, to find the appropriate TeX file in os.environ['TEXINPUTS'], etc., etc.
Similar counterpoint/response for this function. Again I think this is a sufficiently common operation to warrent a help function.
You are also very welcome to propose a different name than find_in_path.
Feel free to post comments for or against this proposal.
Logged In: YES user_id=265183
Two things: 1. My apologies for the poor formatting--this was my first sourceforge post. 2. A small correction to the implementation. An instance of '' should be os.defpath (the default value of os.environ['PATH']).
Logged In: YES user_id=357491
I don't love the idea of adding either proposed functions. They strike me as rather specific for shell usage and not as something you are necessarily going to want constantly.
Commenting on Brett's reply... We use this sort of facility in SpamBayes to search for the OCR program. I crafted something by hand which didn't work on Windows. Mark Hammond had to clean up the mess I made. Having this as a canned function in os would be handy.
Have a look at find_program in this file:
I'm +1 with adding a which function. IMO the appropriate place is shutil and not os.
+1 adding which to shutil
Here is a patch against r76432 which implements a "which" generator function in shutil that yields full file paths where the searched file exists on the PATH. Includes doc change and a test. It is pretty similar to what edemaine had suggested.
This could just as easily return a list, so if that would be more preferable I'll change the patch.
Adapted Brian Curtin's http://bugs.python.org/file15381/ shutil_which.patch and made another reference implementation as a standalone module including the following fixes:
PATHEXT
on WindowsPATH
on Windows, but not before an explicitly passed pathPATH
for files that have a path specified in their name alreadyUse any of these changes in the final shutil patch if you like to. The shutil module has availability 'Unix, Windows' after all.
Updated reference implementation as a standalone module:
Updated version of reference implementation as a standalone module
Made this to more closely resemble the 'which' command behavior, and to make the typical use case simpler. The generator interface is still a good idea imho, so it's kept as which_files(). I'm already using the reference implementation module flawlessly, so I don't see any other changes needed on my side. Your ideas are welcome of course.
SOunds like a good feature to add in shutil, I'll check the patch and also compare it to distutils.spawn.find_executable()
@iki: could you refactor your code so it's in shutil (and the tests in test_shutil)
few remarks: only which/which_files should be public API,
Next, I don't think extensions like VBS should be hardcoded if PATHEXT is not found. A pseudo-dos shell just runs .exe and .com IIRC, if not, there's probably somewhere in the win32 environment a list of extensions and their associated programs that get called automatically from the shell (in addition to PATHEXT). We need to find this list programatically rather that harcoding a list that will change from one box to the other.
Is a “which” function that useful, since we have os.exec functions that search the PATH for us? It seems to me that wanting to know the exact path of an executable is a specialized need, and that adding this function would make beginners use it instead of the os.exec family.
Personally I think it's a very useful feature: the purpose for running which may not be to get the full path to the executable and then run it, but rather that that path prefix is important for something else. I'm sure when I joined this issue I had some need like that -- after all, as you say, if you just want to run it then there are better ways.
In general, once users have a need (for whatever reason) to start firing off subprocesses from their Python scripts, things can get ugly. Providing a portable path search function provides a way to keep one kind of that ugliness inside Python for the cases where it really is needed. As for abuse... well, people will always find a way to abuse bits of the library: I think that's an issue for them, rather than a reason to block this functionality ;)
My $0.02, Andy
As mentioned in the original request, there are at least two motivations for "which" functionality that are distinct from running the program (these days, with the subprocess module). #1 was detecting existence of a program. #2 was finding the *second* instance of a program on the path.
After 9 years, I'm certainly not holding my breath...
I apologize for not reading the first post more carefully, thank you for restating the use cases. I’m +1 now and I’ll review the patches to make it up :)
Bugs may take years to get fixed. Now that Tarek has expressed interest, be sure that this won’t get lost again.
@tarek:
Sorry for not reacting, it completely vaporized out of my head. I'll do the patch this weekend.
Agree, only which/which_files belong to public API.
Regarding PATHEXT:
When a new process is created, the value is taken from registry variable PATHEXT in the 'HKCU\Environment' key, or the 'HKLM\System\CurrentControlSet\Control\Session Manager\Environment' key (in this order). The first key is for custom user values, and PATHEXT is not set by default there. The second key is a system-wide setting and defaults to: A. ".COM;.EXE;.BAT;.CMD" in Windows NT, and possibly also W2K (although it is already 5.0 version) B. ".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH" in Wine [01], XP and WS 2003 [02] C. ".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC" in Vista and W7 and possibly also WS 2008.
When the PATHEXT is missing, or is set to be empty in registry, or is set to be empty in the shell via "set PATHEXT=", then: A. CMD.EXE, START.EXE and standard exec do use the default value, which is probably hardcoded somewhere (not taken from registry) ... tested on XP only B. Wine [11] uses a hardcoded ".BAT;.COM;.CMD;.EXE" (I can't say I do see the reasons for this particular order) C. GnuWin32 which utility [12] uses a hardcoded ".COM;.EXE;.BAT;.CMD"
So, in the corner case when the PATHEXT is set empty for whatever reason, we have basically the following options:
Find some magical way how to get the default value from windows. Any brave soul to fight this?
Stick with basic NT setting ".COM;.EXE;.BAT;.CMD", and document that it doesn't always match the execution behaviour in this case, eg. that .JS file would get executed on XP, but won't be found by which()
Resemble CMD.EXE/START.EXE and standard windows exec behavior, and hardcode the values for different windows versions from NT to W7, and for Wine. This is quite simple to do, as windows versions are well documented in platform.release()(we don't actually have to call this function, just check into which of the 3 intervals the current windows version fits). To do so, I only need someone to verify the correct default PATHEXT for W2K and WS 2008, as I do not have access to these.
My .02$ for 3, as this is what user expects.
What do you think?
[01] http://source.winehq.org/source/tools/wine.inf.in#L556 http://archives.free.net.ph/message/20091231.134245.fce4d24a.en.html
[02] http://technet.microsoft.com/en-us/library/bb490998.aspx http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/ntcmds.mspx?mfr=true http://technet.microsoft.com/en-us/library/cc772691(WS.10).aspx (the manual is same for XP and WS 2003 so maybe they just used copy/paste without checking.
[11] http://source.winehq.org/source/programs/cmd/wcmdmain.c#L1019
[12] http://gnuwin32.sourceforge.net/packages/which.htm see which-2.20-src.zip / ... / which-2.20-src.diff line 388
Just one question: is there some std proc how to run the test suite on the updated stdlib in a working repo? For now, I just copied the Lib/test/test_shutil.py to a parent Lib/ dir, renamed Lib/test temporarily to not collide with a test module, and run python test_shutil.py in Lib, but I guess you have some simpler way to do it.
which is useful for discovering *if* a program is available (I sorely miss it on Windows when I don't have cygwin installed). +1 for adding it to shutil.
Jan, have you tried this: $ ./python -m test.test_shutil
This uninstalled Python will import the shutil module from the checkout.
@ Éric: Thanks, that works nicely :)
@ Michael: You don't need cygwin to use which and many gnu tools on windows. See http://gnuwin32.sourceforge.net/packages.html. After installation you can take the ,exe and .dll files in GnuWin32/bin and use them on any pc/network - just put them to the path.
@ Tarek: In the docs I used ".. versionadded:: 2.7.1", but this guess needs a verification :)
Jan, new features don’t go in stable releases. Your patch targets 3.2 (see the versions field at the top of the page).
Some comments on the patch:
windows
?@ Éric and Antoine: Thanks for the useful hints!
PATH and PATHEXT are now evaluated when needed, not only on module init. The rationale is, that the lib user may change them, eg. to include a directory with additional commands.
the only helper module-level variables left are _windows and _getwinpathext()
updated doctest
updated versionadded in docs to 3.2
looks good, minor changes before I commit it
can you:
one or two usage examples in the Doc would be a nice plus
thx
Hi Tarek
Will do that on Tusdas. Examples make sense too
Jan
On Sun, Aug 8, 2010 at 11:15 AM, Tarek Ziadé \report@bugs.python.org\ wrote:
Tarek Ziadé \ziade.tarek@gmail.com\ added the comment:
looks good, minor changes before I commit it
can you:
- remove all old patches in this issue
- make your code PEP-8
- rename the 'file' argument to 'filename'
- add yourself in ACKS
one or two usage examples in the Doc would be a nice plus
thx
----------
Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue444582\
There is a subtle problem in the reference implementation: it will break if one of the paths in PATH contains quoted path separator. On windows that would be quted with ":
"c:\path;with;sep"
and on *nix something like
/path\:with\:sep
The problem is in the call path.split(os.path.sep) To do this properly we would need another helper function, e.g. shutil.split_path_list(path) that would split paths considering quoting. I should also strip quotes from every path in the list.
I would write reference implementation, but I'm not sure if I know all the quoting rules of various os-es.
As far as I can tell from a little bit of testing, if it is even possible to quote ':' in a posix path it isn't obvious how you do it.
Hello Jan. Can you give us a status update on this one?
Iztok: There is a lot of code out there, written in anything from Python to awk to shell, that splits on “:”. Perhaps it’s okay to just not support “:” in directory names.
Adding people from the duplicate bug to nosy.
Alternate implementation, which looks more thorough: http://code.google.com/p/which/ (from Trent Mick, added to nosy, who’s already agreed to contribute his code).
Hi Jan, are you still going to work on this feature?
Hi Éric, what are we going to do: include Jan's patch when ready or Trent's which
tool on google code?
Cheers, Sandro
Sandro: I merely did some bug triage here. I will let interested parties come to an agreement, and Tarek will make the decisions on this request.
Hello All,
sorry for lack of communication recently, I'd alos like to see it in 3.2 instead of 3.3, but my time is not as scalable as I wish and I can't run on clouds yet .)
I like Trent's module and especially its usefullness via the commandline. I'm also happy about learning on "SOFTWARE\\Microsoft\\Windows\\CurrentVersion\\App Paths\\" key, and did reimplement it in my module (with optional static=True parameter to cache the key and parsed path instead of querying and parsing them on each run).
For inclusion in shutil, I'd imho prefer the interface chosen here, ie. which_files() returns generator, which() returns first match, or raises IOError(errno.ENOENT), but that's my opinion only. There's also adapted the default extension list to match the given Windows version, which helps resembling the real command execution behavior.
The escape+quote chars in path are an interesting problem. I wrote a simple test for them to find out the behavior on Windows XP/7 and Linux, and will do the correct implementation and tests later this week.
Can someone remove obsolete files from the file list and post an up-to-date patch?
I'm not sure what rules are used by Windows to process the PATH string, but I think they are similar to the rules used to parse the command-line into argv in a C/C++ program: http://msdn.microsoft.com/en-us/library/17w5ykft.aspx
I have tested various arrangements of double-quotes in path elements. It appears the quotes can appear anywhere, not just at the start and end of entries. As far as I can tell, *all* they do is toggle the interpretation of the semicolon character between a separator and a character in the directory path. Note in particular: quotes may surround an entire path, segments of a path, fragments of segments of a path, or even may be completely empty. Any number of quotes can appear in a single path entry. There doesn't even need to be an even number of quotes - it seems that an odd number of quotes are treated the same as if a final quote was appended to the very end of the PATH string.
Running my attached test batch file, I see these results:
c:\Users\weeble>pathtest PATH= FAIL
PATH=c:\Users\weeble\foo;bar
FAIL
PATH=c:\Users\weeble\"foo;bar"
SUCCESS
PATH="c:\Users\weeble\foo;bar"
SUCCESS
PATH=c:\Users\weeble\"foo;"bar
SUCCESS
PATH=c:\Users\weeble\foo";"bar
SUCCESS
PATH=c:\Users\weeble\""foo";"bar""
SUCCESS
PATH=
FAIL
PATH=c:\Users\weeble\foo";bar
SUCCESS
Before we miss yet another beta freeze, how does something like this look?
It moves which
into one function which always yields paths. I don't think anyone will approve of adding a dual-function API to solve this problem. I originally tried an approach where the function returned one value and had an option to return an iterator, but nearly everyone I asked found it unacceptable and potentially confusing.
This patch removes the custom pathext stuff seen in other patches. It doesn't seem like it would be that useful to have it, and I'm not sure why you would want to.
Unless anyone is violently opposed, I think we should just go on with it. If we want to add to it, we can add to it in 3.4, but I think something should go in for 3.3. The issue has been open with various patches for over 10 years...
I'm not sure why there isn't a review link for your patch.
"which(file..."
I don't think file is a good name. 'fn' or 'filename' or 'string' would all be better choices, I think. 'file' implies a Python file object (to me). And "the give *file command is called" just looks wrong. "Yield the full path to the executables which could be run if the given *filename were looked up...."
Wait, why are we even returning more than one result? I don't see any use cases for that in the issue (though I admit I just skimmed it). The unix which command doesn't. Wanting only the first match is going to be far more common than wanting a list, so it should be the case supported most easily by the API. If getting the complete list is useful, the API can be extended or a new function added later.
Then the motivation statement becomes clearer: "Yield the full path to the executables which would be run if the given *filename were typed at the shell prompt or passed to the os.exec*p functions."
Given that the function works on Windows, its behavior (looking in the current directory and in PATHEXT) should be documented. And if that's not what exec*p* does on windows, that should be noted, I think (or fixed?)
I'm also not sure what the 'path' argument is for. Does it have a use case? If not, drop it.
You are doing the PATHEXT extraction regardless of platform, which I don't think is a good idea.
This line appears to be useless:
base = os.path.join(dir, file)
Per my suggestion of dropping the list form (at least for now), the yield should become a return, with the default return value of None indicating not found.
I don't think file is a good name.
Changed to "cmd" for command, and that's what the Unix which
calls it as well.
Wait, why are we even returning more than one result? I don't see any use cases for that in the issue (though I admit I just skimmed it). The unix which command doesn't.
I guess I just stuck with what was always there, or maybe I introduced it back in an earlier patch - can't remember. In any case, thanks for mentioning this because the common case really is just wanting one path. which -a
prints all instances on the path, but that'd be an additional and less-used case for sure.
Given that the function works on Windows, its behavior (looking in the current directory and in PATHEXT) should be documented. And if that's not what exec*p* does on windows, that should be noted, I think (or fixed?)
Documented.
I'm also not sure what the 'path' argument is for. Does it have a use case? If not, drop it.
I've needed this myself when maintaining systems that need to start executables via subprocess.Popen using a Path from a configuration file or other inputs. When I'd need to setup a Path which included a lot of dependencies, like for a mini-continous integration system we had, it would be nice to know where the "foo" command was coming from in my temporary build environment.
You are doing the PATHEXT extraction regardless of platform, which I don't think is a good idea.
Changed, it's Windows specific now. I will have access to a linux box to test this out tomorrow morning, but I think the patch should work there.
This line appears to be useless:
Removed, that must have been from an old iteration.
Per my suggestion of dropping the list form (at least for now), the yield should become a return, with the default return value of None indicating not found.
Done. It's all returns now including a default of None.
Here's a patch that also works on linux. A pathext specific test is now skipped since that only matters on Windows, and I forgot a chmod that was making two tests fail on linux.
Added some review comments on latest patch.
Attached is a patch which fixes your review comments in Lib/shutil.py, and it makes an adjustment in the wording of the documentation.
The documentation is a bit more strong in wording that the current directory is always prepended to the path whether its a provided path or the default value. I don't know that the PATHEXT mentions should go much further than saying that it's checked, and then showing what it does by example.
We always need to check the PATHEXT variable because it tells us what the accepted executable extensions are that the system will recognize. On my system, it's ".COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC;.RB;.RBW", so the command prompt allows me to type "python" and it'll search that list, search the path, and find "python.exe" at C:\Python33. It's something we need to check no matter what the path
situation is, default or passed in.
I'm fine with this version going in, but I don't understand what you are saying about PATHEXT. You are getting it from the environment, but you say the shell always looks at the extension. So if someone has changed PATHEXT in their environment, Windows ignores it and does the associations it knows about anyway? That is, PATHEXT is purely informational and changing it has no effect (other than to confuse things)?
I don't know. It makes sense to me. I'll try to find someone else to look at it and adjust it.
I don't really understand why you call abspath(). If the caller wants an absolute path, they can call abspath() themselves.
New changeset 0fe7439e470c by Brian Curtin in branch 'default': Fix bpo-444582. Add shutil.which function for finding programs on the system path. http://hg.python.org/cpython/rev/0fe7439e470c
I don't really understand why you call abspath(). If the caller wants an absolute path, they can call abspath() themselves.
Because that is what which
does.
I just pushed the change before I saw this message, so it went in as the patch shows. Do you want this changed?
I think it would be more surprising if by default it did something different than what the 'which' command does. It also seems like the most useful result to return by default. If there's demand for a non-abspath version we could add that as a feature later.
I just pushed the change before I saw this message, so it went in as the patch shows. Do you want this changed?
Well, I think it would be better indeed.
I think it would be more surprising if by default it did something different than what the 'which' command does.
You know, I've never noticed that Unix which
automatically
abspathified the results (does it always? is it system-dependent? how
about Windows?).
It also seems like the If there's demand for a non-abspath version we could add that as a feature later.
That sounds overkill. If which() calls abspath, then there's no way to get a non-absolute result. While if which() doesn't call abspath, the caller is free to call abspath() if they want to ensure the result is absolute.
Sounds like a no-brainer to me :-)
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/briancurtin' closed_at =
created_at =
labels = ['type-feature', 'library']
title = 'Finding programs in PATH, adding shutil.which'
updated_at =
user = 'https://bugs.python.org/edemaine'
```
bugs.python.org fields:
```python
activity =
actor = 'eric.araujo'
assignee = 'brian.curtin'
closed = True
closed_date =
closer = 'eric.araujo'
components = ['Library (Lib)']
creation =
creator = 'edemaine'
dependencies = []
files = ['8185', '15381', '16441', '16447', '16459', '17957', '17958', '17962', '17963', '18000', '18001', '23797', '26054', '26059', '26078', '26090']
hgrepos = []
issue_num = 444582
keywords = ['patch']
message_count = 61.0
messages = ['53174', '53175', '53176', '55402', '57774', '83484', '95614', '100444', '100460', '100469', '106046', '106055', '109234', '109554', '109583', '109587', '109764', '110033', '110034', '110049', '110068', '110073', '110084', '110085', '110291', '113241', '113285', '113944', '115016', '121398', '121424', '124996', '125222', '125233', '134306', '148475', '163201', '163242', '163253', '163376', '163454', '163466', '163479', '163481', '163482', '163483', '163484', '163485', '163487', '163488', '163490', '163492', '163493', '163764', '163800', '163802', '179889', '179891', '185218', '185220', '185221']
nosy_count = 31.0
nosy_names = ['loewis', 'tmick', 'georg.brandl', 'edemaine', 'belopolsky', 'pitrou', 'vstinner', 'wrstlprmpft', 'giampaolo.rodola', 'christian.heimes', 'ajaksu2', 'sfllaw', 'schmir', 'tarek', 'eric.araujo', 'Christophe Simonis', 'andybuckley', 'weeble', 'r.david.murray', 'tleeuwenburg@gmail.com', 'michael.foord', 'brian.curtin', 'meatballhat', 'petere', 'sandro.tosi', 'iki', 'BreamoreBoy', 'Iztok.Kavkler', 'python-dev', 'takluyver', 'Omega_Weapon']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue444582'
versions = ['Python 3.3']
```