Closed latha19jan closed 4 years ago
ijson does support unicode, could your issue have something to do with the encoding of what you're trying to parse?
UTF-8 bytes work:
>>> data = BytesIO('{"key": "vålue™"}'.encode('utf-8'))
>>> list(ijson.parse(data))
[('', 'start_map', None),
('', 'map_key', 'key'),
('key', 'string', 'vålue™'),
('', 'end_map', None)]
UTF-16LE bytes don't:
>>> data = BytesIO('{"key": "vålue™"}'.encode('utf-16le'))
>>> list(ijson.parse(data))
UnicodeDecodeError
strings / file (encoded as UTF-8) opened in text or binary mode work too:
>>> list(ijson.parse(open('/tmp/test.json')))
[('', 'start_map', None),
('', 'map_key', 'key'),
('key', 'string', 'vålue™'),
('', 'end_map', None)]
>>> list(ijson.parse(open('/tmp/bla.json', 'rt')))
[('', 'start_map', None),
('', 'map_key', 'key'),
('key', 'string', 'vålue™'),
('', 'end_map', None)]
>>> list(ijson.parse(open('/tmp/bla.json', 'rb')))
[('', 'start_map', None),
('', 'map_key', 'key'),
('key', 'string', 'vålue™'),
('', 'end_map', None)]
Characters such as ö,™ when parsing throws UnexpectedSymbol. Can somebody help on this issue.Will ijson support unicode characters