:Author: John Millikin :Copyright: This document has been placed in the public domain.
This is yet another library for reading/writing json. The goal is to
be API compatible with simplejson <http://pypi.python.org/pypi/simplejson>
_, except that it is written
purely in C and is thus 5x-20x faster for both encoding and decoding,
depending on the data.
This is a fork of the jsonlib <https://launchpad.net/jsonlib>
project by the folks over at Freebase <https://www.freebase.com>
who
have data coming from a variety of internal sources in different
encodings and formats, not all of which could simply be represented by
strict JSON.
Thus, the intent of this fork is to be more lenient in its encoding and decoding (as simplejson is) allowing things like NaN/-NaN/Infinity, automatic handling of unicode, and more. The first release of jsonlib2 was version 1.4, which was forked from jsonlib 1.3.10.
.. contents::
jsonlib2 has two functions of interest, read
and write
. It also
defines some exception: ReadError
, WriteError
, and
UnknownSerializerError
.
For compatibility with the standard library, read
is aliased to
loads
and write
is aliased to dumps
. They do not have the
same set of advanced parameters, but may be used interchangeably for
simple invocations.
To deserialize a JSON expression, call the jsonlib2.read
function with
an instance of str
or unicode
. ::
>>> import jsonlib2
>>> jsonlib2.read ('["Hello world!"]')
[u'Hello world!']
Floating-point values
By default, ``jsonlib2`` will parse values such as "1.1" into an instance of
``decimal.Decimal``. To use the built-in value type ``float`` instead, set
the ``use_float`` parameter to ``True``. Please note that this may cause a
loss of precision when parsing some values. ::
>>> jsonlib2.read ('[1.5]', use_float = True)
[1.5]
>>> jsonlib2.read ('[1.1]', use_float = True)
[1.1000000000000001]
>>> jsonlib2.read ('[3.14159265358979323846]', use_float = True)
[3.1415926535897931]
Serialization
-------------
Serialization has more options, but they are set to reasonable defaults.
The simplest use is to call ``jsonlib2.write`` with a Python value. ::
>>> import jsonlib2
>>> jsonlib2.write (['Hello world!'])
'["Hello world!"]'
Pretty-Printing
To "pretty-print" the output, pass a value for the indent
parameter. ::
>>> print jsonlib2.write (['Hello world!'], indent = ' ')
[
"Hello world!"
]
>>>
Mapping Key Sorting
By default, mapping keys are serialized in whatever order they are
stored by Python. To force a consistent ordering (for example, in doctests)
use the ``sort_keys`` parameter. ::
>>> jsonlib2.write ({'e': 'Hello', 'm': 'World!'})
'{"m":"World!","e":"Hello"}'
>>> jsonlib2.write ({'e': 'Hello', 'm': 'World!'}, sort_keys = True)
'{"e":"Hello","m":"World!"}'
Encoding and Unicode
By default, the output is encoded in UTF-8. If you require a different
encoding, pass the name of a Python codec as the encoding
parameter. ::
>>> jsonlib2.write (['Hello world!'], encoding = 'utf-16-be')
'\x00[\x00"\x00H\x00e\x00l\x00l\x00o\x00 \x00w\x00o\x00r\x00l\x00d\x00!\x00"\x00]'
To retrieve an unencoded unicode
instance, pass None
for the
encoding. ::
>>> jsonlib2.write (['Hello world!'], encoding = None)
u'["Hello world!"]'
By default, non-ASCII codepoints are forbidden in the output. To include
higher codepoints in the output, set ensure_ascii
to False
. ::
>>> jsonlib2.write ([u'Hello \u266a'], encoding = None)
u'["Hello \\u266a"]'
>>> jsonlib2.write ([u'Hello \u266a'], encoding = None, ensure_ascii = False)
u'["Hello \u266a"]'
Mapping Key Coercion
Because JSON objects must have string keys, strings will be 'coerced'
to strings. You can override this with the ``coerce_keys`` parameter,
whcih will raise exception when non-string keys are encountered in a
mapping. ::
>>> jsonlib2.write ({True: 1})
'{"true":1}'
>>> jsonlib2.write ({True: 1}, coerce_keys = False)
Traceback (most recent call last):
WriteError: Only strings may be used as object keys.
Serializing Other Types
If the object implements the iterator or mapping protocol, it will be handled automatically. If the object is intended for use as a basic value, it should subclass one of the supported basic values.
String-like objects that do not inherit from str
, unicode
, or
UserString.UserString
will likely be serialized as a list. This will
not be changed. If iterating them returns an instance of the same type, the
serializer might crash. This (hopefully) will be changed.
To serialize a type not known to jsonlib2, use the default
parameter
to write
::
>>> from datetime import date
>>> def default_handler (value):
... if isinstance (value, date): return str (value)
... raise jsonlib2.UnknownSerializerError
>>> jsonlib2.write ([date (2000, 1, 1)], default = default_handler)
'["2000-01-01"]'
Streaming Serializer
When serializing large objects, the use of an in-memory buffer may cause
too much memory to be used. For these situations, use the ``dump`` function
to write objects to a file-like object::
>>> import sys
>>> jsonlib2.dump (["Written to stdout"], sys.stdout)
["Written to stdout"]
>>>
Exceptions
-----------
ReadError
Raised by read
if an error was encountered parsing the expression. Will
contain the line, column, and character position of the error.
Note that this will report the character, not the byte, of the character that caused the error.
WriteError
Raised by ``write`` or ``dump`` if an error was encountered serializing
the passed value.
UnknownSerializerError
A subclass of WriteError
that is raised when a value cannot be
serialized. See the default
parameter to write
.
use_float
parameter
in load
. In dump
, use default
.load
, added parsing support for Infinity, -Infinity, NaNload
, added support for parse_float
, parse_int
,
parse_constant
dump
, allow_non_numbers
became allow_nan
dump
, allow_nan
(formerly allow_non_numbers
) is now
True by defaultdump
, on_unknown
became default
dump
, ascii_only
became ensure_ascii
dump
, coerce_keys
is now True by defaultload
and dump
to deal with objects that are not lists or dictionaries at the root
level.escape_slash
parameter to write()
to stop escaping the
'/' characteruse_float
parameter to read()
.on_unknown
handlers that
return invalid objects.loads()
and dumps()
as aliases to read()
and write()
,
respectively, for compatibility with the new json
standard library
module.repr()
on instances of
decimal.Decimal
and UnicodeDecodeError
.UserString
to be used as object keys.dump()
function.dict
when PyDict_Next()
skips indexes.read
.on_unknown
parameter to write
.__version__
attribute.jsonlib2
and _jsonlib
modules, instead of
a package.utf-8-sig
codec.codecs
module for detecting BOMs in input data.python setup.py test
to work.encoding
parameter to write
, which controls the output
encoding. The default encoding is utf-8
. If the encoding is None
,
a unicode
string will be returned.write
using a C extension module.Thanks to Deron Meranda (author of demjson
) for his excellent JSON library comparison <http://deron.meranda.us/python/comparing_json_modules/>
_,
which revealed many areas for improvement:
Use repr
instead of unicode
for serializing floating-point values,
to avoid unnecessary rounding.
Fixed bug that prevented plus signs in an exponent from being parsed correctly.
Added support for serializing the following types:
generator
set
frozenset
complex
, for values with no imaginary component.array.array
collections.deque
collections.defaultdict
UserList.UserList
UserDict.UserDict
UserString.UserString
Raise an exception if a control character is encountered in a string.
Added support for detecting Unicode byte order marks in the auto decoder.
Allow only arrays and objects to be serialized directly. All other types must be contained within an array or object.
Stricter detection of whitespace.
Also includes some other miscellaneous fixes:
Infinity
and NaN
on Windows.write
, so the user can control the final
encoding.Infinity
, -Infinity
, and NaN
from being serialized
because JSON does not support these values.coerce_keys
parameter to write
. If True
, mapping keys
will be coerced to strings. Defaults to False
.ascii_only
parameter to write
. If True
, non-ASCII
codepoints will always be escaped to a \u sequence. Defaults to True
.security issues <http://t3.dotgnu.info/blog/insecurity/quotes-dont-help.html>
_.jsonlib2.tests
subpackage.sort_keys
parameter to write
. This allows mapping types to
be serialized to a predictable value, regardless of key ordering.indent
to write
. Any string passed as this value will be
used for indentation. If the value is not None
, pretty-printing will
be activated.