Closed jalavik closed 9 years ago
I tried doing the same thing using @jirikuncar's approach: it turned out to be very easy (https://github.com/jirikuncar/jsonalchemy/commit/2480dcf5da4cdedd74217d0459119617059e9754). I think his approach of leaning heavily on jsonschema is preferable.
@jacquerie Agreed on that observation. I favor that option. @jirikuncar @dset0x
@jacquerie can somebody cherry-pick my dict
branch with your updates and create new PR to pu
? I'm quite busy now, but I'll try to merge it in the evening so we have good starting point for tomorrow grand finale. Thanks
@kaplun pointed out a performance pitfall of this approach, which I can confirm using this test: https://github.com/jacquerie/jsonalchemy/commit/171ff87ef33abb361928bbdc3bdc1ca5f2b49b64.
I get:
(jsonalchemy)~/Code/jsonalchemy$ time py.test tests/test_dictstyle.py
================================================ test session starts ================================================
platform linux2 -- Python 2.7.8 -- py-1.4.26 -- pytest-2.7.0
rootdir: /home/jnotarst/Code/jsonalchemy, inifile: pytest.ini
plugins: pep8, cache, cov
collected 9 items
tests/test_dictstyle.py .........
---------------------------------- coverage: platform linux2, python 2.7.8-final-0 ----------------------------------
Name Stmts Miss Cover Missing
-----------------------------------------------------
jsonalchemy/__init__ 2 0 100%
jsonalchemy/dictstyle 82 0 100%
jsonalchemy/factory 26 26 0% 24-65
jsonalchemy/types 0 0 100%
jsonalchemy/utils 8 0 100%
jsonalchemy/version 2 0 100%
jsonalchemy/wrappers 130 130 0% 24-247
-----------------------------------------------------
TOTAL 250 156 38%
============================================ 9 passed in 142.13 seconds =============================================
real 2m22.377s
user 2m16.862s
sys 0m0.137s
@jacquerie have you tried similar tests on @dset0x's and my version?
Argh, sorry for explaining myself badly: what I'm testing there is the "dictstyle" approach of @jirikuncar and @dset0x , using my own implementation of append
.
The issue at hand is that, regardless of the actual implementation, if you perform the validation automatically upon every modification this is quadratic to the number of fields (whatever the exact defintiion). If you are building a typical HEP record (with e.g. 3000 author) as you can see this can have a big performance impact.
In fact, the following patch fixes this performance issue: https://github.com/jacquerie/jsonalchemy/commit/d8383645646b137a5def3c2b5e85ed4ed52ab287. So, I think we will have to code carefully all the mutations and avoid, as @kaplun mentioned, having validation for every modification.
(jsonalchemy)~/Code/jsonalchemy$ time py.test tests/test_dictstyle.py
================================================ test session starts ================================================
platform linux2 -- Python 2.7.8 -- py-1.4.26 -- pytest-2.7.0
rootdir: /home/jnotarst/Code/jsonalchemy, inifile: pytest.ini
plugins: pep8, cache, cov
collected 9 items
tests/test_dictstyle.py .........
---------------------------------- coverage: platform linux2, python 2.7.8-final-0 ----------------------------------
Name Stmts Miss Cover Missing
-----------------------------------------------------
jsonalchemy/__init__ 2 0 100%
jsonalchemy/dictstyle 81 1 99% 126
jsonalchemy/factory 26 26 0% 24-65
jsonalchemy/types 0 0 100%
jsonalchemy/utils 8 0 100%
jsonalchemy/version 2 0 100%
jsonalchemy/wrappers 130 130 0% 24-247
-----------------------------------------------------
TOTAL 249 157 37%
============================================= 9 passed in 0.38 seconds ==============================================
real 0m0.597s
user 0m0.549s
sys 0m0.033s
Just to make d838364 clear (after a conversation with @jacquerie) : This line
validate(value, self.schema['items'])
validates only the new, inserted value. This is what speeds the test up. However, as you can see this means that we have to do the checks that are broader than this value (say maxItems
) manually.
This should be doable however, since jsonschema does not declare many such constraints.
OTOH, an additional benefit to separate the validation step, is the possibility to have temporary inconsistent statuses... IMHO it should be up to the client code to decide when to enforce the validation. (e.g. imagine a situation when there are two depending fields...)
data = JSONDict({'a': []})
with data.transaction() as d: # validation is turned off until __exit__
for i in range(10000):
d['a'].append(i)
Nice proposal! Concretely: is d
an instance of JSONDict
where a validation
flag is switched off? In this case, can the validation be switched off programmatically? E.g. to really let client code to disable it in general.
a validation flag is switched off?
Don't let people to use any internals if you provide API. Then we can change the implementation while keeping compatible API.
Proposal
data = JSONDict({'a': []}) with data.transaction() as d: # validation is turned off until __exit__ for i in range(10000): d['a'].append(i)
Uh, that's a very cool API. :+1:
On the other hand, since I was already working on it, here's an implementation of JSONList
without transactions that takes care about performance: https://github.com/jacquerie/jsonalchemy/commit/f63c27d5624d32e9d34bb53c43a14fffd779f46c
As discussed we are going for dict style implementation.
@jacquerie