python / cpython

The Python programming language
https://www.python.org/
Other
60.85k stars 29.37k forks source link

shlex.quote doesn't work on bytestrings #69753

Open 32835a6c-75a4-4231-9231-3d8eb0d9d3a0 opened 8 years ago

32835a6c-75a4-4231-9231-3d8eb0d9d3a0 commented 8 years ago
BPO 25567
Nosy @bitdancer, @vadmium, @The-Compiler, @willingc, @csabella, @tirkarthi, @aldwinaldwin, @HassanAbouelela, @hrik2001
PRs
  • python/cpython#10871
  • python/cpython#22657
  • Files
  • shlex_quote_bytes_support.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['easy', 'type-feature', 'library', '3.10'] title = "shlex.quote doesn't work on bytestrings" updated_at = user = 'https://bugs.python.org/JonasThiem' ``` bugs.python.org fields: ```python activity = actor = 'hrik2001' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = 'Jonas Thiem' dependencies = [] files = ['40992'] hgrepos = [] issue_num = 25567 keywords = ['patch', 'easy'] message_count = 9.0 messages = ['254186', '254196', '254429', '256413', '326163', '345569', '370274', '385605', '391986'] nosy_count = 12.0 nosy_names = ['r.david.murray', 'martin.panter', 'The Compiler', 'willingc', 'Nan Wu', 'Jonas Thiem', 'cheryl.sabella', 'xtreak', 'aldwinaldwin', 'HassanAbouelela', 'techfixya', 'hrik2001'] pr_nums = ['10871', '22657'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue25567' versions = ['Python 3.10'] ```

    32835a6c-75a4-4231-9231-3d8eb0d9d3a0 commented 8 years ago

    Demonstration:

    >>> import shlex
    >>> shlex.quote(b"abc")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python3.4/shlex.py", line 285, in quote
        if _find_unsafe(s) is None:
    TypeError: can't use a string pattern on a bytes-like object
    >>>

    Your question is now probably, why would anyone not want to use unicode strings here?

    The reason is that for some operations (e.g. file access to some known paths) decoding and encoding from/to any sort of unicode interpretation can be lossy, specifically when the file path on the filesystem has broken/mixed encoding characters. In such a case, the shell command might need to be supplied as bytestring to ensure it is sent exactly as-is so such broken files can still be dealt with, without the Unicode interpretation possibly deforming the path in some bytes.

    Since shlex.quote seems targeted at shell usage, it should therefore support this.

    bitdancer commented 8 years ago

    I think that this is a reasonable request, and probably applies to the whole shlex module, although less strongly.

    You could use the surrogateescape hack to work around the problem:

    shlex.quote(mydata.encode('ascii', 'surrogateescape')).decode('ascii', 'surrogateescape)

    That might be the only practical way to handle bytes input to the shlex parser, if we do also want to tackle that.

    Note that it is already the case that os module functions that retrn filenames and stdin/stdout use surrogateescape, so a naive program may actually work with binary filenames (which is why the handler is used in those contexts).

    3d3643ef-0805-40b0-8106-5600a91a57f5 commented 8 years ago

    Added a patch for support this in quote method. What is a good example or a group of examples to demonstrate the usage in the document?

    vadmium commented 8 years ago

    I think the documentation needs a “Changed in version 3.6” notice

    tirkarthi commented 5 years ago

    Thanks for the patch since the current workflow uses GitHub PR the patch can be made as a PR to move it forward. It seems there are some conflicts as I tried to apply the attached patch against latest master.

    Thanks

    171dd431-1405-4dc4-9fad-f60cc722fcc8 commented 5 years ago
    Python 3.9.0a0
    [GCC 7.3.0] on linux
    >>> import re
    >>> find_unsafe_bytes = re.compile(b'[^\w@%+=:,./-]').search
    <stdin>:1: SyntaxWarning: invalid escape sequence \w

    when removing \w, all the tests pass

    (my regex knowledge is close to None.)

    "\w stands for "word character". It always matches the ASCII characters [A-Za-z0-9_]"

    replace \w with A-Za-z0-9_ ?? (all the tests pass)

    csabella commented 4 years ago

    The first pull request has been closed, so this issue is available to be worked on. If the original patch or PR are used, please credit the original authors. Thanks!

    e2b5eea2-22a2-4fde-beec-c9413ae7516b commented 3 years ago

    How to Install Brother mfc-l2740dw driver on Windows https://techfixya.com/how-to-install-brother-mfc-l2740dw-driver-on-windows/

    327a6d2b-ef7c-4ef8-99ae-0d0dafaa9199 commented 3 years ago

    Looks like this issue has been solved? What is there to be worked on?

    calestyo commented 11 months ago

    For the records, a while ago, I've asked for clarification at POSIX, what (POSIX-compatible) shells are expected to be able to hold in shell variables (and thus any string that might appear in shells).
    Answer is, any binary string, except that it must not contain 0x0.

    Furthermore, the upcoming POSIX revision is going to include dollar-single-quote-quoting (i.e.strings like $'a\nb' which are already supported by some (but not all shells).
    This will also include \xXX an \ddd, allowing to specify bytes, which would easily allow to escape any bytes from Python.
    Note that \uXXXX and \UXXXXXXXX are not going to be specified in POSIX and thus cannot portably be used.