python / cpython

The Python programming language
https://www.python.org
Other
63.16k stars 30.24k forks source link

subprocess.check_output() docs misrepresent what shell=True does #64543

Open 9bdefe90-7406-4971-ae89-ea7ba7dc505b opened 10 years ago

9bdefe90-7406-4971-ae89-ea7ba7dc505b commented 10 years ago
BPO 20344
Nosy @bitdancer, @vadmium
Files
  • issue20344.patch: Patch to issue 20344
  • documentation20344.patch
  • 20344_3.patch
  • 20344_4.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library', 'docs'] title = 'subprocess.check_output() docs misrepresent what shell=True does' updated_at = user = 'https://bugs.python.org/klausman' ``` bugs.python.org fields: ```python activity = actor = 'martin.panter' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation', 'Library (Lib)'] creation = creator = 'klausman' dependencies = [] files = ['34543', '34619', '34635', '39747'] hgrepos = [] issue_num = 20344 keywords = ['patch'] message_count = 12.0 messages = ['208811', '208814', '214332', '214364', '214845', '214943', '214949', '214950', '243632', '243638', '243642', '245542'] nosy_count = 5.0 nosy_names = ['r.david.murray', 'docs@python', 'klausman', 'martin.panter', 'Tuomas.Savolainen'] pr_nums = [] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue20344' versions = ['Python 2.7', 'Python 3.4', 'Python 3.5', 'Python 3.6'] ```

    9bdefe90-7406-4971-ae89-ea7ba7dc505b commented 10 years ago

    The subprocess docs state that the first argument can be either a string or an iterable that contains the program and arguments to run. It also points out that using shell=True allows for shell constructs. It does not mention a subtlety that is introduced by Python's use of sh -c (on Unix):

    Using a string that contains the full command line with args and shell=False breaks as expected:

    >>> subprocess.check_output("ls foo", shell=False)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.3/subprocess.py", line 576, in check_output
        with Popen(*popenargs, stdout=PIPE, **kwargs) as process:
      File "/usr/lib64/python3.3/subprocess.py", line 824, in __init__
        restore_signals, start_new_session)
      File "/usr/lib64/python3.3/subprocess.py", line 1448, in _execute_child
        raise child_exception_type(errno_num, err_msg)
    FileNotFoundError: [Errno 2] No such file or directory: 'ls foo'

    Using shell=True instead works as expected (since foo does not exist, ls exits with non-0 and a different Exception is raised. This, too is to spec and exactly what the docs describe.

    >>> subprocess.check_output("ls foo", shell=True)
    ls: cannot access foo: No such file or directory
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.3/subprocess.py", line 589, in check_output
        raise CalledProcessError(retcode, process.args, output=output)
    subprocess.CalledProcessError: Command 'ls foo' returned non-zero exit status 2

    Using shell=False and making the command a list of command and args does the same thing:

    >>> subprocess.check_output(["ls", "foo"], shell=False)
    ls: cannot access foo: No such file or directory
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.3/subprocess.py", line 589, in check_output
        raise CalledProcessError(retcode, process.args, output=output)
    subprocess.CalledProcessError: Command '['ls', 'foo']' returned non-zero exit status 2

    But using an iterable _and_ requesting a shell results in something very broken:

    >>> subprocess.check_output(["ls", "foo"], shell=True)
    [contents of my homedir are returned]
    >>>

    Note that the argument "foo" completely disappears, apparently. strace() reveals that "foo" is never added to the call to "ls". Instead, it becomes $0 in the shell that ls runs in. This is exactly to spec (i.e. bash is not broken). "bash -c foo bar baz" will start a shell that sets $0 to "bar", $1 to "baz" and then executes "foo". Whereas "bash -c 'foo bar baz'" will run 'foo bar baz'.

    I think there are three possible fixes here: 1- Make check_output(list, shell=True) run something like "bash -c '%s'" % " ".join(list) 2- Change the docs to warn of the unintended consequences of combining lists with shell=True 3- Make check_output() throw an Exception if the first argument is a list and shell=True

    The first option would make it easiest for people to "just use it", but I fear the string manipulation may become the source of bugs with security implications in the future.

    The second option keeps the status quo, but people tend to not read docs, so it may not be as effective as one would like, especially since shell-True already has warnings and people tend to go "yeah I know about unverified input, but it's not a problem for me" and then ignore both warnings.

    Option 3 has the disadvantage that existing scripts that _expect_ the behavior may break.

    bitdancer commented 10 years ago

    This was discussed in bpo-6760. There I advocated (and still advocate) option (3). Unfortunately it looks like our doc update in that issue actually *lost* the documentation for your option (2) that used to be there, though it wasn't exactly clear what it meant when it was there.

    So IMO we should do a doc update to add that info back in a clearer form, and also do a deprecation cycle to disallow shell=True with a sequence. I haven't ever gotten consensus from the other devs on the latter, though.

    109a3680-ffa5-47e7-a857-7d695e9e1545 commented 10 years ago

    Made a patch that throws exception as suggested: "3- Make check_output() throw an Exception if the first argument is a list and shell=True"

    bitdancer commented 10 years ago

    Thanks Tuomas, but we don't have any consensus that that kind of change will be accepted. It's just my opinion that it should be...and if it was, it would have to start with a deprecation, not raising an exception.

    What we need as a patch for this issue is a documentation patch explaining why using shell=True with a list is a bad idea in the 'frequently used arguments' section. The paragraph about 'args' in the frequently used arguments section should say something like 'using shell=True with a list containing more than one element does not currently raise an error, but it will not do what you expect; only the first element of the list will be executed on unix, and on windows the quoting is likely to be incorrect. See the Popen constructor documentation below for a more precise description.'

    (You can see why I think this usage should be deprecated, I hope ;)

    109a3680-ffa5-47e7-a857-7d695e9e1545 commented 10 years ago

    Created a patch that adds notice of using shell=True and iterable to the documentation. Please do comment if the formatting is wrong (this my first documentation patch).

    9bdefe90-7406-4971-ae89-ea7ba7dc505b commented 10 years ago

    Hi!

    On Tue, 25 Mar 2014, Tuomas Savolainen wrote:

    Created a patch that adds notice of using shell=True and iterable to the documentation. Please do comment if the formatting is wrong (this my first documentation patch).

    I'd use articles, i.e. "and a list" and "does not raise an error"

    Also, s/except/expect/

    Regards, Tobias

    bitdancer commented 10 years ago

    Also, the "see below" sentence is missing.

    109a3680-ffa5-47e7-a857-7d695e9e1545 commented 10 years ago

    Corrected the spelling of the patch.

    vadmium commented 9 years ago

    “Only the first element . . . will be executed on Unix” is a bit simplistic. The behaviour is already described more fully in the “shell” documentation under \https://docs.python.org/dev/library/subprocess.html#subprocess.Popen\.

    Also, “args” is allowed to be a _sequence, but an _iterable is a more general concept.

    bitdancer commented 9 years ago

    Can you suggest a better phrasing that is still succinct? (Maybe just 'appear to be executed' would make it accurate enough?) But, that's why there needs to be a reference of *some* sort to the full explanation.

    I'm not sure what the issue is with iterable/sequence. check_output/Popen accepts an iterable.

    vadmium commented 9 years ago

    Actually David I didn’t notice your suggested wording \https://bugs.python.org/issue20344#msg214364\ before. Adding that last sentence, pointing to the full description of the Popen constructor, would be fine.

    My complaint about mentioning “iterable” is that iterables are not mentioned anywhere else in the documentation. I would rather leave it out or change it to “sequence”, unless the rest of the documentation is made consistent, e.g. “args . . . should be a string, or a sequence of program arguments” would also need fixing.

    vadmium commented 9 years ago

    I think it might be better to leave the platform-specific details in the full Popen description, not under Frequently Used Arguments. I suggest to use 20344_4.patch: