python / cpython

The Python programming language
https://www.python.org
Other
62.36k stars 29.94k forks source link

json.dumps with ensure_ascii=False doesn't escape control characters #65393

Closed 635abe7e-16c3-43be-adf1-c703ae129470 closed 10 years ago

635abe7e-16c3-43be-adf1-c703ae129470 commented 10 years ago
BPO 21194
Nosy @rhettinger, @pitrou, @ezio-melotti, @4kir4

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['invalid', 'type-bug', 'library'] title = "json.dumps with ensure_ascii=False doesn't escape control characters" updated_at = user = 'https://bugs.python.org/weeble' ``` bugs.python.org fields: ```python activity = actor = 'ned.deily' assignee = 'none' closed = True closed_date = closer = 'ned.deily' components = ['Library (Lib)'] creation = creator = 'weeble' dependencies = [] files = [] hgrepos = [] issue_num = 21194 keywords = [] message_count = 3.0 messages = ['215868', '215898', '215923'] nosy_count = 5.0 nosy_names = ['rhettinger', 'pitrou', 'ezio.melotti', 'weeble', 'akira'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue21194' versions = ['Python 3.4'] ```

635abe7e-16c3-43be-adf1-c703ae129470 commented 10 years ago

The JSON spec (http://www.json.org/) does not allow unescaped control characters. (See the railroad diagram for strings and the grammar on the right.) If json.dumps is called with ensure_ascii=False, it fails to escape control codes in the range U+007F to U+009F. Here's an example:

>>> import json
>>> import unicodedata
>>> for i in range(256):
...     jsonstring = json.dumps(chr(i), ensure_ascii=False)
...     if any(unicodedata.category(ch) == 'Cc' for ch in jsonstring):
...         print("Fail:",repr(chr(i)))
Fail: '\x7f'
Fail: '\x80'
Fail: '\x81'
Fail: '\x82'
Fail: '\x83'
Fail: '\x84'
Fail: '\x85'
Fail: '\x86'
Fail: '\x87'
Fail: '\x88'
Fail: '\x89'
Fail: '\x8a'
Fail: '\x8b'
Fail: '\x8c'
Fail: '\x8d'
Fail: '\x8e'
Fail: '\x8f'
Fail: '\x90'
Fail: '\x91'
Fail: '\x92'
Fail: '\x93'
Fail: '\x94'
Fail: '\x95'
Fail: '\x96'
Fail: '\x97'
Fail: '\x98'
Fail: '\x99'
Fail: '\x9a'
Fail: '\x9b'
Fail: '\x9c'
Fail: '\x9d'
Fail: '\x9e'
Fail: '\x9f'
7fe5d93b-2a2c-46a0-b5cd-5602c591856a commented 10 years ago

json.dumps works correctly in this case.

Both json/application rfc 1 and ecma json standard 2 say:

All characters may be placed within the quotation marks, except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters (U+0000 through U+001F).

i.e., only a subset (00-1F) of control characters must be escaped in json string

635abe7e-16c3-43be-adf1-c703ae129470 commented 10 years ago

Ah, sorry for the confusion.