WIP New approach - Githubissues

jirikuncar commented 9 years ago

I'm not really sure about wrapping immutable types.

cc @jalavik @jacquerie

jacquerie commented 9 years ago

One use case I see is defining a DOI schema, which is a JSONString with some format validation, or an international telephone number.

MSusik commented 9 years ago

Well, there are more use cases. ORCID validation, email validation, url validation, etc.

jacquerie commented 9 years ago

A little preliminary testing on the performance of wrapping immutables, using this code:

In [1]: from jsonalchemy import performance

In [2]: %timeit performance.wrap_perf
10000000 loops, best of 3: 58.7 ns per loop

In [3]: %timeit performance.no_wrap_perf
The slowest run took 19.18 times longer than the fastest. This could mean that an intermediate result is being cached
10000000 loops, best of 3: 62.2 ns per loop

In [4]: %timeit performance.no_wrap_perf
10000000 loops, best of 3: 62 ns per loop

In [5]: %timeit performance.wrap_perf
10000000 loops, best of 3: 58.8 ns per loop

A little higher, but less than a 10% increase. Now I will test memory use.

jacquerie commented 9 years ago

When I measure memory usage I get some weird results:

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    45     13.0 MiB      0.0 MiB   @profile
    46                             def wrap_perf():
    47     13.0 MiB      0.0 MiB       schema = load_schema_from_url(
    48     13.0 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    49
    50    381.8 MiB    368.8 MiB       data = JSONArray([str(el) for el in xrange(1000000)], schema)

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    52     17.6 MiB      0.0 MiB   @profile
    53                             def no_wrap_perf():
    54     17.6 MiB      0.0 MiB       schema = load_schema_from_url(
    55     17.6 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    56
    57    382.2 MiB    364.6 MiB       data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    52     17.9 MiB      0.0 MiB   @profile
    53                             def no_wrap_perf():
    54     17.9 MiB      0.0 MiB       schema = load_schema_from_url(
    55     17.9 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    56
    57    389.8 MiB    371.9 MiB       data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    45     18.2 MiB      0.0 MiB   @profile
    46                             def wrap_perf():
    47     18.2 MiB      0.0 MiB       schema = load_schema_from_url(
    48     18.2 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    49
    50    382.8 MiB    364.7 MiB       data = JSONArray([str(el) for el in xrange(1000000)], schema)

In one case the wrapped version consumes more memory, in the other the unwrapped version is consuming more. I think that this means that the amount of memory consumed by immutable type wrappers is negligible.

MSusik commented 9 years ago

cc @jirikuncar :arrow_up:

As yesterday there was an assumption without any proof that wrapping will cause big overhead.

egabancho commented 9 years ago

To me wrapping inmutables is not the big issue, instead the small mess inside __new__ and __init__ is something we should look closer and find a nicer/safer solution (if we decided to continue with this approach), maybe using metaclasses instead of 'raping' __new__?

tiborsimko commented 9 years ago

In [2]: %timeit performance.wrap_perf 10000000 loops, best of 3: 58.7 ns per loop

In [3]: %timeit performance.no_wrap_perf The slowest run took 19.18 times longer than the fastest. This could mean that an intermediate result is being cached 10000000 loops, best of 3: 62.2 ns per loop

An interesting observation: wrap_perf is faster than no_wrap_perf? Shouldn't no wrapping be always faster than some wrapping? Are the tests OK?

tiborsimko commented 9 years ago

The slowest run took 19.18 times longer than the fastest. This could mean that an intermediate result is being cached

19x longer seems excessive... Do you use some caching or memoising? If not, you may be perhaps cons'ing too much (seeing large xrange(...) and list additions) and may be bitten by this Python GC problem? You can try to store results in a variable and disable or force GC...

MSusik commented 9 years ago

To me wrapping inmutables is not the big issue, instead the small mess inside new and init is something we should look closer and find a nicer/safer solution (if we decided to continue with this approach), maybe using metaclasses instead of 'raping' new?

I agree, it looks horrible. However, I can't think of another way of subclassing from __dict__ directly, extending it with a schema and keeping nice class structure. I'd love to hear some suggestion for improvements.

jacquerie commented 9 years ago

An interesting observation: wrap_perf is faster than no_wrap_perf? Shouldn't no wrapping be always faster than some wrapping? Are the tests OK?

Uh! I didn't notice it was the other way around from what it's reasonable. Maybe it's because redefining append on JSONArrayNoWrapping requires a method search every time the method is needed?

19x longer seems excessive... Do you use some caching or memoising? If not, you may be perhaps cons'ing too much (seeing large xrange(...) and list additions) and may be bitten by this Python GC problem? You can try to store results in a variable and disable or force GC...

I only had that message on the first run I made. I agree that the constructor is doing too much consing, in fact it should probably be rewritten as a list comprehension and retested.

jacquerie commented 9 years ago

Surprising as it is, I think I confirm the previous result as far as memory usage goes. With this code I get:

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    46     12.9 MiB      0.0 MiB   @profile
    47                             def wrap_perf():
    48     12.9 MiB      0.0 MiB       schema = load_schema_from_url(
    49     12.9 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    50
    51    381.8 MiB    368.9 MiB       data = JSONArray([str(el) for el in xrange(1000000)], schema)

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    54     17.5 MiB      0.0 MiB   @profile
    55                             def no_wrap_perf():
    56     17.5 MiB      0.0 MiB       schema = load_schema_from_url(
    57     17.5 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    58
    59    382.2 MiB    364.6 MiB       data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    54     17.9 MiB      0.0 MiB   @profile
    55                             def no_wrap_perf():
    56     17.9 MiB      0.0 MiB       schema = load_schema_from_url(
    57     17.9 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    58
    59    382.4 MiB    364.5 MiB       data = JSONArrayNoWrapping([str(el) for el in xrange(1000000)], schema)

Filename: jsonalchemy/performance.py

Line #    Mem usage    Increment   Line Contents
================================================
    46     18.2 MiB      0.0 MiB   @profile
    47                             def wrap_perf():
    48     18.2 MiB      0.0 MiB       schema = load_schema_from_url(
    49     18.2 MiB      0.0 MiB           abs_path('schemas/performance.json'))
    50
    51    382.7 MiB    364.5 MiB       data = JSONArray([str(el) for el in xrange(1000000)], schema)

(to replicate: $ python jsonalchemy/performance.py.)

jirikuncar commented 9 years ago

New direction has been chosen. Thanks for the prototype.

inveniosoftware-attic / jsonalchemy

WIP New approach #37