python / cpython

The Python programming language
https://www.python.org
Other
62.41k stars 29.96k forks source link

make bytes/bytearray translate's delete a keyword argument #71693

Closed zhangyangyu closed 8 years ago

zhangyangyu commented 8 years ago
BPO 27506
Nosy @bitdancer, @vadmium, @serhiy-storchaka, @zhangyangyu
Files
  • bytes_translate_delete_as_keyword_arguments.patch
  • bytes_translate_delete_as_keyword_arguments_v2.patch
  • table-optional-delete-empty.patch
  • bytes_translate_delete_as_keyword_arguments_v3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/vadmium' closed_at = created_at = labels = ['interpreter-core', 'type-feature'] title = "make bytes/bytearray translate's delete a keyword argument" updated_at = user = 'https://github.com/zhangyangyu' ``` bugs.python.org fields: ```python activity = actor = 'xiang.zhang' assignee = 'martin.panter' closed = True closed_date = closer = 'martin.panter' components = ['Interpreter Core'] creation = creator = 'xiang.zhang' dependencies = [] files = ['43703', '43705', '43841', '44138'] hgrepos = [] issue_num = 27506 keywords = ['patch'] message_count = 18.0 messages = ['270303', '270310', '270317', '270318', '270349', '270354', '271074', '271090', '272434', '272482', '272486', '272497', '272771', '273008', '273013', '273337', '273768', '273785'] nosy_count = 5.0 nosy_names = ['r.david.murray', 'python-dev', 'martin.panter', 'serhiy.storchaka', 'xiang.zhang'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue27506' versions = ['Python 3.6'] ```

    zhangyangyu commented 8 years ago

    Write a patch to make bytes/bytearray.translate's delete argument support acting as keyword arguments. This won't break any backwards compatibility and make the method more flexible to use. Besides, in the C code level, it stops using argument clinic's legacy optional group feature and removes the unnecessary group_right_1 parameter.

    zhangyangyu commented 8 years ago

    Hmm, David, that may be not quite right. Users only reading the doc never know it's deletechars not delete. The doc is always delete, though conflicting with __doc__.

    >>> print(bytes.translate.__doc__)
    B.translate(table[, deletechars]) -> bytes
    ...

    I deliberately change deletechars to delete to keep consistent with doc. But actually I think using deletechars won't break backwards compatibility too.

    bitdancer commented 8 years ago

    Ah, I was looking at the 2.7 docs.

    zhangyangyu commented 8 years ago

    Please review the new version. It makes two changes comparing with the last one.

    1. It exposes Python parameter as "delete" (which the document always uses so I think it's the API) while still use "deletechars" (which I prefer as a C variable name) in C code.

    2. It allows *delete* to be None. Before this is not allowed but I don't think this change breaks backwards compatibility. The reason for this change is that I don't want users to get surprised when they pass the default value to translate but then get an exception.

    vadmium commented 8 years ago

    Instead of allowing delete=None (which is not in the RST documentation), perhaps it is possible to change the doc string. I can’t remember the details, but I think Argument Clinic allows a virtual Python-level default, something like “object(py_default=b"") = NULL”.

    Also, I think I like the change. What do you think about making the first argument optional (default to None), allowing calls like x.translate(delete=b'aeiou')?

    zhangyangyu commented 8 years ago

    Thanks for your comment Martin. I'll apply them later when we reach agreement on functions.

    I have already used object = NULL, the C default is not necessary here, and it works as you like I think. In patch version 1, b'abc'.translate(None, None) raises exception as before. I change it in patch version 2 because argument clinic generates function signature as "($self, table, /, delete=None)". So I don't want users get surprised when they provide None as the signature but get an exception. And using None as a placeholder for a keyword argument is normal in Python. But I'm OK to keep the previous behaviour and actually I prefer that.

    As for making the first argument optional, I don't quite like that since the doc seems to encourage users to set None explicitly.

    vadmium commented 8 years ago

    This patch is what I had in mind for setting the documented default as delete=b'', but using NULL internally.

    I also changed it to allow the table argument to be omitted. We can change the documentation accordingly. These are just suggestions; use either or both aspects as you please :)

    zhangyangyu commented 8 years ago

    LGTM. Using b'' instead of the None as the default value of *delete* looks better since it doesn't break backwards compatibility. As for the first argument optional or not, actually it's both okay. You have changed the doc accordingly.

    vadmium commented 8 years ago

    Serhiy, you assigned this to yourself. What do you think of my patch?

    serhiy-storchaka commented 8 years ago

    PyArg_ParseTupleAndKeywords can be slower than PyArg_ParseTuple even for positional arguments. We need benchmarking results (especially after committing a patch for bpo-27574).

    What is the purpose of adding support of the delete argument as keyword arguments? It looks to me, that the only purpose is allowing to specify the delete argument without specifying the table argument. There are two alternative ways to achieve this: make translate() accepting some special value (e.g. None) as the default value for the first argument:

    b'hello'.translate(None, b'l')

    or make translate() accepting the delete argument as keyword argument:

    b'hello'.translate(delete=b'l')

    The patch does both things, but only one is needed. If add the support of the delete argument as keyword argument, I would prefer to not add the support of None as the first argument, but would specify its default value as bytes(range(256)):

    table: object(c_default="NULL") = bytes(range(256))
    /
    delete as deletechars: object(c_default="NULL") = b''

    I don't know why optional group was used here, the function could be implemented without it.

    vadmium commented 8 years ago

    I agree it would be worth checking for a slowdown.

    As well as giving the option of omitting the table argument, it would make call sites easier to read. It would avoid suggesting that the first argument is translated to the second, like maketrans().

    data = data.translate(YENC_TABLE, delete=b"\r\n")

    Translate() already accepts None as the first argument; this is not new:

    >>> b"hello".translate(None, b"l")
    b'heo'

    I guess the optional group was used as a way of making the second argument optional without a specific default value.

    zhangyangyu commented 8 years ago

    So let's do a simple benchmark.

    # without patch

    ./python -m timeit -s 'string=bytes(range(256));table=bytes(range(255, -1, -1));delete=b"abcdefghijklmn"' 'string.translate(table, delete)' 1000000 loops, best of 3: 0.55 usec per loop

    # with patch

    ./python -m timeit -s 'string=bytes(range(256));table=bytes(range(255, -1, -1));delete=b"abcdefghijklmn"' 'string.translate(table, delete)' 1000000 loops, best of 3: 0.557 usec per loop

    # keyword specified

    ./python -m timeit -s 'string=bytes(range(256));table=bytes(range(255, -1, -1));delete=b"abcdefghijklmn"' 'string.translate(table, delete=delete)' 1000000 loops, best of 3: 0.771 usec per loop

    From my observation, the difference between PyArg_ParseTupleAndKeywords and PyArg_ParseTuple when parsing positional arguments is very small. This means it won't make old code slowdown by a large percent. And when keyword argument is specified, there is a degrade. But I think this happens everywhere using PyArg_ParseTupleAndKeywords.

    serhiy-storchaka commented 8 years ago

    Technically the patch looks correct to me. Added just few minor comments on Rietveld. I don't think there is a large need in adding the support of keyword argument. But since the overhead is small and somebody needs this, adding this doesn't do a harm. Left it on you Martin.

    vadmium commented 8 years ago

    I can look at enhancing the tests at some stage, but it isn’t a high priority for me.

    Regarding translate() with no arguments, it makes sense if you see it as a kind of degenerate case of neither using a translation table, nor any set of bytes to delete:

    x.translate() == x.translate(None, b"")

    I admit it reads strange and probably isn’t useful. If people dislike it, it might be easiest to just add the keyword support and keep the first parameter as mandatory:

    without_nulls = bytes_with_nulls.translate(None, delete=b"\x00")
    zhangyangyu commented 8 years ago

    Martin, I write the v3 patch to apply the comments. It preserves *table* as mandatory and move the test_translate to BaseBytesTest to remove duplicates.

    vadmium commented 8 years ago

    Looks pretty good thanks Xiang. There’s one English grammar problem in a comment (see review), but I can fix that when I commit.

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 8 years ago

    New changeset 6ab1b54245d5 by Martin Panter in branch 'default': Issue bpo-27506: Support bytes/bytearray.translate() delete as keyword argument https://hg.python.org/cpython/rev/6ab1b54245d5

    zhangyangyu commented 8 years ago

    Yay, thanks for your work, Martin.