Closed GoogleCodeExporter closed 8 years ago
Hey Marc! Thanks for interesting.
Actually, this is expectable behaviour due to pure Python implementation:
cPickle and stdlib json are uses C-powered extensions and only with PyPy pure
version __may try__ to beat them with noticed difference. As for linux there is
mine results for decoding/encoding 4KB sized CouchDB document(attached):
sys.version : '2.7.3 (default, Jul 5 2012, 08:55:40) \n[GCC 4.5.3]'
sys.platform : 'linux2'
* [test_1] Handle 4KB sized CouchDB document with various data
* [simpleubjson] Decoded in 67.837171 (0.001357 / call)
* [json_stdlib] Decoded in 4.190959 (0.000084 / call)
* [ujson] Decoded in 2.343383 (0.000047 / call)
* [simplejson_c] Decoded in 3.463531 (0.000069 / call)
* [simplejson_py] Decoded in 76.436388 (0.001529 / call)
* [simpleubjson] Encoded in 58.248465 (0.001165 / call)
* [json_stdlib] Encoded in 11.809877 (0.000236 / call)
* [ujson] Encoded in 6.779758 (0.000136 / call)
* [simplejson_c] Encoded in 13.605745 (0.000272 / call)
* [simplejson_py] Encoded in 51.807545 (0.001036 / call)
So actually simpleubjson is equivalent to simplejson without C-speedups. If
you'll compile simpleubjson with Cython this gives you 50% boost for free. I'll
add this feature soon. But to solve this problem once and forever there is need
for some libubj.so and C-extension.
Original comment by kxepal
on 11 Dec 2012 at 3:13
Attachments:
Actually it isn't comparable at all if you use bigger data. There is clearly
something wrong with the algorithmic complexity. It took *seconds* for data
which is only about 1 MB.
Here is a benchmark with simplejson without C extension:
In [1]: import simplejson
In [2]: data = [1, 2, True, False, 'abcd']
In [3]: %timeit s = simplejson.dumps(data); simplejson.loads(s)
10000 loops, best of 3: 54.1 us per loop
In [4]: data = dict((i, str(i) * 10) for i in xrange(2000))
In [5]: %timeit s = simplejson.dumps(data); simplejson.loads(s)
10 loops, best of 3: 42.5 ms per loop
In [6]: data = dict((i, str(i) * 10) for i in xrange(20000))
In [7]: %timeit s = simplejson.dumps(data); simplejson.loads(s)
1 loops, best of 3: 478 ms per loop
In [8]: import simpleubjson
In [9]: data = [1, 2, True, False, 'abcd']
In [10]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10000 loops, best of 3: 69 us per loop
In [13]: %timeit s = simpleubjson.encode(data); simpleubjson[Cn.decode(s)
KeyboardInterrupt
In [13]: data = dict((str(i), str(i) * 10) for i in xrange(2000))
In [14]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 212 ms per loop
In [15]: data = dict((str(i), str(i) * 10) for i in xrange(20000))
In [16]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 23.2 s per loop
Original comment by marc.sch...@gmail.com
on 11 Dec 2012 at 6:03
FYI, this was on my MacBook:
In [37]: platform.system()
Out[37]: 'Darwin'
In [38]: platform.mac_ver()
Out[38]: ('10.7.5', ('', '', ''), 'x86_64')
In [39]: sys.version
Out[39]: '2.7.3 (default, Aug 28 2012, 06:21:54) \n[GCC 4.2.1 Compatible Apple
Clang 4.0 ((tags/Apple/clang-421.0.60))]'
Original comment by marc.sch...@gmail.com
on 11 Dec 2012 at 6:07
Interesting that the encoding is the bottleneck. I would have guessed that the
parsing is badly implemented :)
In [62]: %timeit simpleubjson.encode(data)
1 loops, best of 3: 19.2 s per loop
In [64]: s = simpleubjson.encode(data)
In [66]: %timeit simpleubjson.decode(s)
1 loops, best of 3: 268 ms per loop
Original comment by marc.sch...@gmail.com
on 11 Dec 2012 at 6:37
The current tip is a bit faster:
In [1]: import simpleubjson
In [2]: data = dict((str(i), str(i) * 10) for i in xrange(20000))
In [3]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 9.64 s per loop
Original comment by marc.sch...@gmail.com
on 11 Dec 2012 at 6:53
Huh, interesting...and bad news. I have an idea that all these problems comes
from a) unwise usage of StringIO at streamify func[1] and b) non-optimal
encoder module itself which produces a lot of func calls and lookups.
Ironically, but I'd already optimized both decoder and encoder a lot and have
thought that the limit was reached. Thanks for kicking me, I'll review
algorithms and logic with clear sight.
1:
http://code.google.com/p/simpleubjson/source/browse/simpleubjson/decoder.py#22
2: http://code.google.com/p/simpleubjson/source/browse/simpleubjson/encoder.py
Original comment by kxepal
on 11 Dec 2012 at 7:14
Ok, it looks pretty easy to improve decoding speed for 2-4 times (depended on
Python version, 3.x is faster) without losing current behaviour and
introspection feature.
Encoding is not so trivial and I'd only gain tiny boost for breaking all
things. Still digging, but have pessimistic views on this part of library.
Original comment by kxepal
on 5 Jan 2013 at 3:21
Pushed proof of concept of possible optimizations that really rocks:
was: rf038508d8b9b
now: rb1c8c4c1d806
That's for pointing on pickle module. I'd tried to invent something so simple
and better, but finally had stopped on his design: there are several options
how to make even more faster, but with cost of code readability and a lot of
pylint cries (:
Original comment by kxepal
on 7 Apr 2013 at 7:00
Fixed with next results r50cf44ce252f
According your benchmark:
In [1]: import simpleubjson
In [2]: simpleubjson.__version__
Out[2]: '0.6.0'
In [3]: data = [1, 2, True, False, 'abcd']
In [4]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10000 loops, best of 3: 31.3 us per loop
In [5]: data = dict((str(i), str(i) * 10) for i in range(20000))
In [6]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10 loops, best of 3: 154 ms per loop
I think now it's much more better. I believe it could be significantly faster
only with C-ext module, but that's topic for another issue(; Previous results
was really horrible:
In [5]: data = [1, 2, True, False, 'abcd']
In [6]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10000 loops, best of 3: 57.4 us per loop
In [9]: data = dict((str(i), str(i) * 10) for i in xrange(20000))
In [10]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 13.1 s per loop
That was very interesting issue, thanks!
Original comment by kxepal
on 10 Apr 2013 at 7:28
Sorry, wrong results reference. Correct one: r524a8055e350
Original comment by kxepal
on 10 Apr 2013 at 7:29
Original issue reported on code.google.com by
marc.sch...@gmail.com
on 11 Dec 2012 at 2:43