Closed MSusik closed 9 years ago
One use case I see is defining a DOI schema, which is a JSONString
with some format
validation, or an international telephone number.
Well, there are more use cases. ORCID validation, email validation, url validation, etc.
A little preliminary testing on the performance of wrapping immutables, using this code:
In [1]: from jsonalchemy import performance
In [2]: %timeit performance.wrap_perf
10000000 loops, best of 3: 58.7 ns per loop
In [3]: %timeit performance.no_wrap_perf
The slowest run took 19.18 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 62.2 ns per loop
In [4]: %timeit performance.no_wrap_perf
10000000 loops, best of 3: 62 ns per loop
In [5]: %timeit performance.wrap_perf
10000000 loops, best of 3: 58.8 ns per loop
A little higher, but less than a 10% increase. Now I will test memory use.
When I measure memory usage I get some weird results:
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
45 13.0 MiB 0.0 MiB @profile
46 def wrap_perf():
47 13.0 MiB 0.0 MiB schema = load_schema_from_url(
48 13.0 MiB 0.0 MiB abs_path('schemas/performance.json'))
49
50 381.8 MiB 368.8 MiB data = JSONArray([str(el) for el in xrange(1000000)], schema)
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
52 17.6 MiB 0.0 MiB @profile
53 def no_wrap_perf():
54 17.6 MiB 0.0 MiB schema = load_schema_from_url(
55 17.6 MiB 0.0 MiB abs_path('schemas/performance.json'))
56
57 382.2 MiB 364.6 MiB data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
52 17.9 MiB 0.0 MiB @profile
53 def no_wrap_perf():
54 17.9 MiB 0.0 MiB schema = load_schema_from_url(
55 17.9 MiB 0.0 MiB abs_path('schemas/performance.json'))
56
57 389.8 MiB 371.9 MiB data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
45 18.2 MiB 0.0 MiB @profile
46 def wrap_perf():
47 18.2 MiB 0.0 MiB schema = load_schema_from_url(
48 18.2 MiB 0.0 MiB abs_path('schemas/performance.json'))
49
50 382.8 MiB 364.7 MiB data = JSONArray([str(el) for el in xrange(1000000)], schema)
In one case the wrapped version consumes more memory, in the other the unwrapped version is consuming more. I think that this means that the amount of memory consumed by immutable type wrappers is negligible.
cc @jirikuncar :arrow_up:
As yesterday there was an assumption without any proof that wrapping will cause big overhead.
To me wrapping inmutables is not the big issue, instead the small mess inside __new__
and __init__
is something we should look closer and find a nicer/safer solution (if we decided to continue with this approach), maybe using metaclasses instead of 'raping' __new__
?
In [2]: %timeit performance.wrap_perf 10000000 loops, best of 3: 58.7 ns per loop
In [3]: %timeit performance.no_wrap_perf The slowest run took 19.18 times longer than the fastest. This could mean that an intermediate result is being cached 10000000 loops, best of 3: 62.2 ns per loop
An interesting observation: wrap_perf
is faster than no_wrap_perf
? Shouldn't no wrapping be always faster than some wrapping? Are the tests OK?
The slowest run took 19.18 times longer than the fastest. This could mean that an intermediate result is being cached
19x longer seems excessive... Do you use some caching or memoising? If not, you may be perhaps cons'ing too much (seeing large xrange(...)
and list additions) and may be bitten by this Python GC problem? You can try to store results in a variable and disable or force GC...
To me wrapping inmutables is not the big issue, instead the small mess inside new and init is something we should look closer and find a nicer/safer solution (if we decided to continue with this approach), maybe using metaclasses instead of 'raping' new?
I agree, it looks horrible. However, I can't think of another way of subclassing from __dict__
directly, extending it with a schema and keeping nice class structure. I'd love to hear some suggestion for improvements.
An interesting observation: wrap_perf is faster than no_wrap_perf? Shouldn't no wrapping be always faster than some wrapping? Are the tests OK?
Uh! I didn't notice it was the other way around from what it's reasonable. Maybe it's because redefining append
on JSONArrayNoWrapping
requires a method search every time the method is needed?
19x longer seems excessive... Do you use some caching or memoising? If not, you may be perhaps cons'ing too much (seeing large xrange(...) and list additions) and may be bitten by this Python GC problem? You can try to store results in a variable and disable or force GC...
I only had that message on the first run I made. I agree that the constructor is doing too much consing, in fact it should probably be rewritten as a list comprehension and retested.
Surprising as it is, I think I confirm the previous result as far as memory usage goes. With this code I get:
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
46 12.9 MiB 0.0 MiB @profile
47 def wrap_perf():
48 12.9 MiB 0.0 MiB schema = load_schema_from_url(
49 12.9 MiB 0.0 MiB abs_path('schemas/performance.json'))
50
51 381.8 MiB 368.9 MiB data = JSONArray([str(el) for el in xrange(1000000)], schema)
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
54 17.5 MiB 0.0 MiB @profile
55 def no_wrap_perf():
56 17.5 MiB 0.0 MiB schema = load_schema_from_url(
57 17.5 MiB 0.0 MiB abs_path('schemas/performance.json'))
58
59 382.2 MiB 364.6 MiB data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
54 17.9 MiB 0.0 MiB @profile
55 def no_wrap_perf():
56 17.9 MiB 0.0 MiB schema = load_schema_from_url(
57 17.9 MiB 0.0 MiB abs_path('schemas/performance.json'))
58
59 382.4 MiB 364.5 MiB data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)
Filename: jsonalchemy/performance.py
Line # Mem usage Increment Line Contents
================================================
46 18.2 MiB 0.0 MiB @profile
47 def wrap_perf():
48 18.2 MiB 0.0 MiB schema = load_schema_from_url(
49 18.2 MiB 0.0 MiB abs_path('schemas/performance.json'))
50
51 382.7 MiB 364.5 MiB data = JSONArray([str(el) for el in xrange(1000000)], schema)
(to replicate: $ python jsonalchemy/performance.py
.)
New direction has been chosen. Thanks for the prototype.
I'm not really sure about wrapping immutable types.
cc @jalavik @jacquerie