Open stevendaprano opened 11 years ago
As per the discussion here:
http://mail.python.org/pipermail/python-ideas/2013-July/022419.html
\N{} escapes should support the Unicode code point notation U+xxxx (where there are four, five or six hex digits after the U+).
E.g. '\N{U+03BB}' => 'λ'
unicodedata.lookup should also support such numeric names, e.g.:
unicodedata.lookup('U+03BB') => 'λ'
As '+' is otherwise prohibited in Unicode character names, there should never be ambiguity between 'U+xxxx' as a code point and an actual name, and a single lookup function can handle both.
(See http://www.unicode.org/versions/Unicode6.2.0/ch04.pdf#G39 for details on characters allowed in names.)
Also add a function for the reverse
unicodedata.codepoint('λ') => 'U+03BB'
def codepoint(c):
return 'U+{:04X}'.format(ord(c))
I've attached a patch for this.
I agree with the proposal.
Some of the code seems redundant with code we already have. In Python, I would write
def codepoint_from_U_notation(name, namelen):
if not (4 <= namelen <= 6): raise <wrong length>
return chr(int(name, 16))
maybe with try-except to re-write error messages like ValueError: invalid literal for int() with base 16: '99x3' ValueError: chr() arg not in range(0x110000)
My point is that we already have code to convert hex strings to int; I presume PyUnicode_FromOrdinal(code) is the C version of 'chr' that already checks the max value.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-feature', 'expert-unicode']
title = 'Enhanced \\N{} escapes for Unicode strings'
updated_at =
user = 'https://github.com/stevendaprano'
```
bugs.python.org fields:
```python
activity =
actor = 'terry.reedy'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Unicode']
creation =
creator = 'steven.daprano'
dependencies = []
files = ['31112']
hgrepos = []
issue_num = 18614
keywords = ['patch']
message_count = 3.0
messages = ['194075', '194087', '194123']
nosy_count = 4.0
nosy_names = ['terry.reedy', 'ezio.melotti', 'mrabarnett', 'steven.daprano']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue18614'
versions = ['Python 3.4']
```