Performance is horrible

GoogleCodeExporter commented 8 years ago

See benchmark: https://gist.github.com/3134391

What version of the product are you using? On what operating system?
0.5.0, Windows.

Original issue reported on code.google.com by marc.sch...@gmail.com on 11 Dec 2012 at 2:43

GoogleCodeExporter commented 8 years ago

Hey Marc! Thanks for interesting.

Actually, this is expectable behaviour due to pure Python implementation: 
cPickle and stdlib json are uses C-powered extensions and only with PyPy pure 
version __may try__ to beat them with noticed difference. As for linux there is 
mine results for decoding/encoding 4KB sized CouchDB document(attached):

sys.version : '2.7.3 (default, Jul  5 2012, 08:55:40) \n[GCC 4.5.3]'
sys.platform : 'linux2'
* [test_1] Handle 4KB sized CouchDB document with various data
    * [simpleubjson]  Decoded in 67.837171 (0.001357 / call)
    * [json_stdlib]   Decoded in 4.190959 (0.000084 / call)
    * [ujson]         Decoded in 2.343383 (0.000047 / call)
    * [simplejson_c]  Decoded in 3.463531 (0.000069 / call)
    * [simplejson_py] Decoded in 76.436388 (0.001529 / call)

    * [simpleubjson]  Encoded in 58.248465 (0.001165 / call)
    * [json_stdlib]   Encoded in 11.809877 (0.000236 / call)
    * [ujson]         Encoded in 6.779758 (0.000136 / call)
    * [simplejson_c]  Encoded in 13.605745 (0.000272 / call)
    * [simplejson_py] Encoded in 51.807545 (0.001036 / call)

So actually simpleubjson is equivalent to simplejson without C-speedups. If 
you'll compile simpleubjson with Cython this gives you 50% boost for free. I'll 
add this feature soon. But to solve this problem once and forever there is need 
for some libubj.so and C-extension.

Original comment by kxepal on 11 Dec 2012 at 3:13

Changed state: Accepted

Attachments:

CouchDB4k.compact.json

GoogleCodeExporter commented 8 years ago

Actually it isn't comparable at all if you use bigger data. There is clearly 
something wrong with the algorithmic complexity. It took *seconds* for data 
which is only about 1 MB. 

Here is a benchmark with simplejson without C extension:

In [1]: import simplejson

In [2]: data = [1, 2, True, False, 'abcd']

In [3]: %timeit s = simplejson.dumps(data); simplejson.loads(s)
10000 loops, best of 3: 54.1 us per loop

In [4]: data = dict((i, str(i) * 10) for i in xrange(2000))

In [5]: %timeit s = simplejson.dumps(data); simplejson.loads(s)
10 loops, best of 3: 42.5 ms per loop

In [6]: data = dict((i, str(i) * 10) for i in xrange(20000))

In [7]: %timeit s = simplejson.dumps(data); simplejson.loads(s)
1 loops, best of 3: 478 ms per loop

In [8]: import simpleubjson

In [9]: data = [1, 2, True, False, 'abcd']

In [10]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10000 loops, best of 3: 69 us per loop

In [13]: %timeit s = simpleubjson.encode(data); simpleubjson[Cn.decode(s)
KeyboardInterrupt

In [13]: data = dict((str(i), str(i) * 10) for i in xrange(2000))

In [14]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 212 ms per loop

In [15]: data = dict((str(i), str(i) * 10) for i in xrange(20000))

In [16]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 23.2 s per loop

Original comment by marc.sch...@gmail.com on 11 Dec 2012 at 6:03

GoogleCodeExporter commented 8 years ago

FYI, this was on my MacBook:

In [37]: platform.system()
Out[37]: 'Darwin'

In [38]: platform.mac_ver()
Out[38]: ('10.7.5', ('', '', ''), 'x86_64')

In [39]: sys.version
Out[39]: '2.7.3 (default, Aug 28 2012, 06:21:54) \n[GCC 4.2.1 Compatible Apple 
Clang 4.0 ((tags/Apple/clang-421.0.60))]'

Original comment by marc.sch...@gmail.com on 11 Dec 2012 at 6:07

GoogleCodeExporter commented 8 years ago

Interesting that the encoding is the bottleneck. I would have guessed that the 
parsing is badly implemented :)

In [62]: %timeit simpleubjson.encode(data)
1 loops, best of 3: 19.2 s per loop

In [64]: s = simpleubjson.encode(data)

In [66]: %timeit simpleubjson.decode(s)
1 loops, best of 3: 268 ms per loop

Original comment by marc.sch...@gmail.com on 11 Dec 2012 at 6:37

GoogleCodeExporter commented 8 years ago

The current tip is a bit faster:

In [1]: import simpleubjson

In [2]: data = dict((str(i), str(i) * 10) for i in xrange(20000))

In [3]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 9.64 s per loop

Original comment by marc.sch...@gmail.com on 11 Dec 2012 at 6:53

GoogleCodeExporter commented 8 years ago

Huh, interesting...and bad news. I have an idea that all these problems comes 
from a) unwise usage of StringIO at streamify func[1] and b) non-optimal 
encoder module itself which produces a lot of func calls and lookups. 
Ironically, but I'd already optimized both decoder and encoder a lot and have 
thought that the limit was reached. Thanks for kicking me, I'll review 
algorithms and logic with clear sight.

1: 
http://code.google.com/p/simpleubjson/source/browse/simpleubjson/decoder.py#22
2: http://code.google.com/p/simpleubjson/source/browse/simpleubjson/encoder.py

Original comment by kxepal on 11 Dec 2012 at 7:14

GoogleCodeExporter commented 8 years ago

Ok, it looks pretty easy to improve decoding speed for 2-4 times (depended on 
Python version, 3.x is faster) without losing current behaviour and 
introspection feature.

Encoding is not so trivial and I'd only gain tiny boost for breaking all 
things. Still digging, but have pessimistic views on this part of library.

Original comment by kxepal on 5 Jan 2013 at 3:21

GoogleCodeExporter commented 8 years ago

Pushed proof of concept of possible optimizations that really rocks:
was: rf038508d8b9b
now: rb1c8c4c1d806

That's for pointing on pickle module. I'd tried to invent something so simple 
and better, but finally had stopped on his design: there are several options 
how to make even more faster, but with cost of code readability and a lot of 
pylint cries (:

Original comment by kxepal on 7 Apr 2013 at 7:00

Added labels: Priority-Critical
Removed labels: Priority-Medium

GoogleCodeExporter commented 8 years ago

Fixed with next results r50cf44ce252f

According your benchmark:

In [1]: import simpleubjson

In [2]: simpleubjson.__version__
Out[2]: '0.6.0'

In [3]: data = [1, 2, True, False, 'abcd']

In [4]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10000 loops, best of 3: 31.3 us per loop

In [5]: data = dict((str(i), str(i) * 10) for i in range(20000))

In [6]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10 loops, best of 3: 154 ms per loop

I think now it's much more better. I believe it could be significantly faster 
only with C-ext module, but that's topic for another issue(; Previous results 
was really horrible:

In [5]: data = [1, 2, True, False, 'abcd']

In [6]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
10000 loops, best of 3: 57.4 us per loop

In [9]: data = dict((str(i), str(i) * 10) for i in xrange(20000))

In [10]: %timeit s = simpleubjson.encode(data); simpleubjson.decode(s)
1 loops, best of 3: 13.1 s per loop

That was very interesting issue, thanks!

Original comment by kxepal on 10 Apr 2013 at 7:28

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Sorry, wrong results reference. Correct one: r524a8055e350

Original comment by kxepal on 10 Apr 2013 at 7:29

jayendra13 / simpleubjson

Performance is horrible #6