Open ronaldoussoren opened 12 years ago
I'm sometimes using an array.array with format character "u" as a writable backing store for buffers shared with platform APIs that access buffers of UCS2 values. This works fine in python 3.2 and earlier with a ucs2 build of python, but no longer works with python 3.3 because the "u" character explicitly selects a UCS4 representation in that version.
An example of how I use this is using PyObjC on MacOSX, for example:
b = array.array('u', "hello world")
s = CFStringCreateMutableWithExternalCharactersNoCopy(
None, b, len(b), len(b), kCFAllocatorNull)
"s" now refers to a mutable Objective-C string that uses "b" as its backing store.
It would be nice if there were a format code that would allow me to do this with Python 3.3, for example b = array.array("U", ...)
(BTW. I'm sorry if this is a duplicate, searching for "array.array" on the tracker results in a lot of hits, most of which have nothing to do with the array module)
See also bpo-13072 and the discussion starting at:
http://mail.python.org/pipermail/python-dev/2012-March/117390.html
I think the priority should be "high", since the current behavior doesn't preserve the status quo. Also, PEP-3118 suggests 'u' for UCS2 and 'w' for UCS4.
Hmm, obviously the discussion starts here:
http://mail.python.org/pipermail/python-dev/2012-March/117376.html
This one should be fixed by bpo-13072. Could you check again?
As Stefan noted, so long as Py_UNICODE is 16 bits in the Mac OS X builds, then this should now be back to the 3.2 behaviour.
It's not back to the 3.2 behavior. In 3.3, Py_UNICODE is always equal to wchar_t, which is a 4-byte type on Darwin. However, CFString is based on UniChar, which is a 2-byte type.
That this worked in 3.2 was by accident - it would work only in "narrow" builds. Python's configure in 3.2 and before wouldn't default to using wchar_t on Darwin since it didn't consider wchar_t "usable", which in turn happened because wchar_t is signed on Darwin, but Py_UNICODE was understood to be unsigned.
Since it's too late to add an 'U' code to 3.3, as a work-around, you would have to use a 'H' array, and initialize it with map(ord, the_string)).
Chances are good that a proper UCS-2 array code gets added to 3.4.
Py_UNICODE is an typedef for wchar_t and that type is 4 bytes long:
>>> a.tobytes()
b'h\x00\x00\x00e\x00\x00\x00l\x00\x00\x00l\x00\x00\x00o\x00\x00\x00 \x00\x00\x00w\x00\x00\x00o\x00\x00\x00r\x00\x00\x00l\x00\x00\x00d\x00\x00\x00'
>>> a = array.array('u', 'bar')
>>> a.tobytes()
b'b\x00\x00\x00a\x00\x00\x00r\x00\x00\x00'
>>> len(a.tobytes())
12
>>>
This is with a checkout that was created yesterday.
The issue is not resolved, there now is no way to easily create a UCS2 buffer; while there was in earlier releases of Python (with the default narrow build)
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['extension-modules', 'type-bug']
title = 'array.array of UCS2 values'
updated_at =
user = 'https://github.com/ronaldoussoren'
```
bugs.python.org fields:
```python
activity =
actor = 'methane'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Extension Modules']
creation =
creator = 'ronaldoussoren'
dependencies = []
files = []
hgrepos = []
issue_num = 15035
keywords = []
message_count = 7.0
messages = ['162520', '162521', '162522', '168374', '168376', '168378', '168379']
nosy_count = 7.0
nosy_names = ['loewis', 'ronaldoussoren', 'ncoghlan', 'christian.heimes', 'Arfrever', 'methane', 'skrah']
pr_nums = []
priority = 'high'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue15035'
versions = ['Python 3.4']
```