python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

memoryview + bytes fails #60149

Open 8726d1eb-a365-45b6-b81d-c75988975e5a opened 12 years ago

8726d1eb-a365-45b6-b81d-c75988975e5a commented 12 years ago
BPO 15945
Nosy @pitrou, @glyph, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug'] title = 'memoryview + bytes fails' updated_at = user = 'https://bugs.python.org/exarkun' ``` bugs.python.org fields: ```python activity = actor = 'Jean-Paul Calderone' assignee = 'none' closed = False closed_date = None closer = None components = [] creation = creator = 'exarkun' dependencies = [] files = [] hgrepos = [] issue_num = 15945 keywords = [] message_count = 12.0 messages = ['170511', '170512', '170513', '170579', '170580', '170619', '172676', '172678', '172687', '172688', '172689', '172690'] nosy_count = 4.0 nosy_names = ['pitrou', 'glyph', 'Arfrever', 'martin.panter'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue15945' versions = ['Python 3.3'] ```

8726d1eb-a365-45b6-b81d-c75988975e5a commented 12 years ago
Python 3.3.0rc2+ (default:9def2209a839, Sep 10 2012, 08:44:51) 
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> memoryview(b'foo') + b'bar'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'memoryview' and 'bytes'
>>> b'bar' + memoryview(b'foo')
b'barfoo'
>>>
5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 12 years ago

What is the expected outcome? memoryviews can't be resized, so this scenario isn't possible:

>>> bytearray([1,2,3]) + b'123'
bytearray(b'\x01\x02\x03123')
pitrou commented 12 years ago

Just prepend the empty bytestring if you want to make sure the result is a bytes object:

>>> b'' + memoryview(b'foo') + b'bar'
b'foobar'

I think the following limitation may be more annoying, though:

>>> b''.join([memoryview(b'foo'), b'bar'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected bytes, memoryview found
8726d1eb-a365-45b6-b81d-c75988975e5a commented 12 years ago

What is the expected outcome? memoryviews can't be resized, so this scenario isn't possible:

The same as view.tobytes() + bytes, but without the extra copy implied by view.tobytes().

Just prepend the empty bytestring if you want to make sure the result is a bytes object:

Or I could explicitly convert the memoryview to a bytes object. That strikes me as rather preferable. However, this defeats one use of memoryview, which is to avoid unnecessary copying. So it might be suitable workaround for some cases, but not all.

pitrou commented 12 years ago

Or I could explicitly convert the memoryview to a bytes object. That strikes me as rather preferable. However, this defeats one use of memoryview, which is to avoid unnecessary copying. So it might be suitable workaround for some cases, but not all.

Indeed, that's why I think it would be good to fix the bytes.join() method (which is precisely meant to minimize copying and resizing).

pitrou commented 12 years ago

Opened bpo-15958 for the bytes.join enhancement.

b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 12 years ago

It's worth noting that the "buffer()" built-in in Python2 had this behavior, and it enabled a copy-reduction optimization within Twisted's outgoing transport buffer.

There are of course other ways to do this, but it seems like it would be nice to restore this handy optimization; it seems like a bug, or at least an oversight, that the convenience 'bytes+memoryview' (which cannot provide a useful optimization) works, but 'memoryview+bytes' (which would be equally helpful from a convenience perspective _could_ provide a reduction in copying) doesn't.

Despite the bytes.join optimization (which, don't get me wrong, is also very helpful, almost necessary) this remains very useful.

pitrou commented 12 years ago

I'm not sure what you're talking about since:

>>> b = buffer("abc")
>>> b + "xyz"
'abcxyz'
>>> (b + "xyz") is b
False

... doesn't look like it avoid copies to me.

b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 12 years ago

Le Oct 11, 2012 à 12:13 PM, Antoine Pitrou \report@bugs.python.org\ a écrit :

Antoine Pitrou added the comment:

I'm not sure what you're talking about since:

>>> b = buffer("abc") >>> b + "xyz" 'abcxyz' >>> (b + "xyz") is b False

... doesn't look like it avoid copies to me.

The case where copies are avoided is documented here:

\http://twistedmatrix.com/trac/browser/trunk/twisted/internet/abstract.py?rev=35733#L20\

pitrou commented 12 years ago

The case where copies are avoided is documented here

... which would be handled nicely by bpo-15958.

b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 12 years ago

Yes, it would be *possible* to fix it with that alone, but that still makes it a pointless 'gotcha' in differing behavior between memoryview and buffer, especially given that bytes+memoryview does something semantically different than memoryview+bytes for no reason.

pitrou commented 12 years ago

Well, the fact that memoryview + bytes wouldn't return you a memoryview object might be a good reason to disallow it. Compare with:

>>> bytearray(b"x") + b"y"
bytearray(b'xy')
>>> b"x" + bytearray(b"y")
b'xy'