pandastrike / jsck

JSON Schema Compiled checK
MIT License
160 stars 14 forks source link

check into themis #81

Closed gilesbowkett closed 9 years ago

gilesbowkett commented 9 years ago

@Muscula tried to do a PR using the validator library https://github.com/playlyfe/themis with our benchmarks, but ran into trouble.

## Benchmarks for Draft 4

Schema: 'Event - Valid document'.  A simple schema, exercising very few attributes
Sample size: 64
Validations per sample: 1024

  JSCK
  Warming up: ................................
  Iterations: ................................................................

  tv4
  Warming up: ................................
  Iterations: ................................................................

  jayschema
  Warming up: ................................
  Iterations: ................................................................

  is-my-json-valid
  Warming up: ................................
  Iterations: ................................................................

  themis
  Warming up: ................................
  Iterations: ................................................................

  JSCK: validations/millisecond
  median: 116.848    max: 135.217    min: 68.582 

  tv4: validations/millisecond
  median: 59.104    max: 67.56    min: 43.675 

  jayschema: validations/millisecond
  median: 0.857    max: 1.068    min: 0.713 

  is-my-json-valid: validations/millisecond
  median: 1208.26    max: 1488.372    min: 594.312 

  themis: validations/millisecond
  median: 488.2    max: 607.715    min: 365.193 

Relative speeds:
is-my-json-valid : 1.000
themis : 2.475
JSCK : 10.340
tv4 : 20.443
jayschema : 1409.692

Schema: 'Configuration'.  A moderately complex schema with some nesting and value constraints
Sample size: 64
Validations per sample: 256

  JSCK
  Warming up: ................................
  Iterations: ................................................................

  tv4
  Warming up: ................................
  Iterations: ................................................................

  jayschema
  Warming up: ................................
  Iterations: ................................................................

  is-my-json-valid
  Warming up: ................................
  Iterations: ................................................................

  themis
  Warming up: ................................
  Iterations: ................................................................

  JSCK: validations/millisecond
  median: 57.612    max: 71.369    min: 38.924 

  tv4: validations/millisecond
  median: 20.741    max: 24.056    min: 15.737 

  jayschema: validations/millisecond
  median: 0.46    max: 0.481    min: 0.423 

  is-my-json-valid: validations/millisecond
  median: 606.635    max: 812.698    min: 179.902 

  themis: validations/millisecond
  median: 72.48    max: 86.341    min: 44.329 

Relative speeds:
is-my-json-valid : 1.000
themis : 8.370
JSCK : 10.530
tv4 : 29.249
jayschema : 1317.964

Schema: 'Transaction'.  
Sample size: 64
Validations per sample: 64

  JSCK
  Warming up: ................................
  Iterations: ................................................................

  tv4
  Warming up: ................................
  Iterations: ................................................................

  jayschema
  Warming up: ................................
  Iterations: ................................................................

  is-my-json-valid
  Warming up: ................................
  Iterations: ................................................................

  themis
  Warming up: TypeError: Object #<Object> has no method '0#transaction'
  at Object.validators.0#/items (<anonymous>:72:43)
  at Object.validators.(anonymous function) [as 0] (<anonymous>:59:48)
  at eval (<anonymous>:3826:37)
  at /Users/allanebdrup/Dropbox/Muscula-jsck/node_modules/themis/src/themis.js:2631:22
  at module.exports.themis.validate (/Users/allanebdrup/Dropbox/Muscula-jsck/benchmarks/draft4/validators.coffee:57:8)
  at Benchmark._measure (/Users/allanebdrup/Dropbox/Muscula-jsck/benchmarks/runner.coffee:31:15)
  at Benchmark.module.exports.Benchmark.run (/Users/allanebdrup/Dropbox/Muscula-jsck
ebdrup commented 9 years ago

This is the code I used to add themis.

  "themis":
     setup: (schema) ->
       require("themis").validator(schema)
     validate: ({validator, schema, document}) ->
       validator(document, '0')
     error: (result) ->
       if result.valid == true
         false
       else
         result.errors
gilesbowkett commented 9 years ago

I used similar code, and encountered similar errors.

The errors seem to come from an expectation in themis that every schema will specify an id attribute at its top level. This is not required in the spec.

I also checked themis against the official JSON Schema Test Suite using a Node.js harness which Panda Strike created last week. themis had a hard time with these tests, because the vast majority of the schemas in these tests do not have id attributes.

In fact, of 88 tests, only one had an id attribute at the top level.

I'm not closing the ticket, but I am going to de-prioritize testing themis for now, because I have a busy week next week, and especially because of this id attribute situation. I don't know how to make themis pass the official JSON Schema tests, and it might not be possible.

If it doesn't pass the official tests, then the question is by how much. JSCK decided not to implement a few small pieces of the spec, and many JSON Schema validators have made similar decisions. If this id thing is the only quirk with themis, we should probably figure out a way to test it, presumably by adding an id attribute. But if there are a ton of other places where themis deviates from the spec, it might not be worth the effort.

So, temporary wontfix, at least not until Feb 9th or so.

automatthew commented 9 years ago

+1 wontfix.

ebdrup commented 9 years ago

I had no problem getting themis to pass the official test suite. If you use

require("themis").validator(schema)(document, '0')

It works without id's just fine. It's the '0' that does the trick. (also in the example code I shared above)

For reference here is the result I get when running themis against the official testsuite: https://github.com/Muscula/json-schema-benchmark/blob/master/reports/themis.md

But we can agree that is a pretty weird implementation themis has for this.

gilesbowkett commented 9 years ago

well, go ahead and post a gist.

ebdrup commented 9 years ago

Well there is one test failing for themis:

change resolution scope, changed scope ref invalid

`Expected result: false but validator returned: "Object # has no method '0'"``

This is probably what disqualifies themis from your benchmarks.

atrniv commented 9 years ago

The latest version of themis 1.1.5 should pass the jsck benchmarks. I've created a pull request for the same #82

However I believe the benchmarks require some improvement. I got widely varying results sometimes while running them. I would suggest using benchmarkjs. It provide much more accurate statistics.

To illustrate my point here's a sample of 5 runs of the benchmark with jsck, imjv and themis

Run1

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes Sample size: 64 Validations per sample: 1024

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 2.278 Themis : 3.750 JSCK : 11.732

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints Sample size: 64 Validations per sample: 256

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 6.494 Themis : 7.073 JSCK : 11.017

Schema: 'Transaction'.
Sample size: 64 Validations per sample: 64

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 2.652 Themis : 3.458 JSCK : 4.178

Run 2

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes Sample size: 64 Validations per sample: 1024

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 2.126 Themis : 3.162 JSCK : 10.432

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints Sample size: 64 Validations per sample: 256

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 7.976 Themis[minimal] : 9.040 JSCK : 22.875

Schema: 'Transaction'.
Sample size: 64 Validations per sample: 64

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 3.281 Themis[minimal] : 4.964 JSCK : 6.159

Run 3

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes Sample size: 64 Validations per sample: 1024

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 1.774 Themis[minimal] : 2.021 JSCK : 14.632

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints Sample size: 64 Validations per sample: 256

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 3.844 Themis : 4.944 JSCK : 7.367

Schema: 'Transaction'.
Sample size: 64 Validations per sample: 64

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 3.099 Themis[minimal] : 4.867 JSCK : 7.621

Run 4

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes Sample size: 64 Validations per sample: 1024

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 2.985 Themis[minimal] : 4.255 JSCK : 19.160

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints Sample size: 64 Validations per sample: 256

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 8.446 JSCK : 10.161 Themis : 13.009

Schema: 'Transaction'.
Sample size: 64 Validations per sample: 64

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 3.611 Themis[minimal] : 4.344 JSCK : 7.837

Run 5

Benchmarks for Draft 4

Schema: 'Event - Valid document'. A simple schema, exercising very few attributes Sample size: 64 Validations per sample: 1024

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 1.485 Themis : 2.029 JSCK : 8.936

Schema: 'Configuration'. A moderately complex schema with some nesting and value constraints Sample size: 64 Validations per sample: 256

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis[minimal] : 3.636 Themis : 4.441 JSCK : 6.794

Schema: 'Transaction'.
Sample size: 64 Validations per sample: 64

JSCK Warming up: ................................ Iterations: ................................................................

Themis[minimal] Warming up: ................................ Iterations: ................................................................

Themis Warming up: ................................ Iterations: ................................................................

is-my-json-valid Warming up: ................................ Iterations: ................................................................

Relative speeds: is-my-json-valid : 1.000 Themis : 2.367 Themis[minimal] : 2.370 JSCK : 2.848

gilesbowkett commented 9 years ago

we already have an issue to use benchmark, pull requests welcome: https://github.com/pandastrike/jsck/issues/29

anyway, congrats. this looks nice and fast. although this result happened for me just now:

Themis : 8.699
JSCK : 9.476
Themis[minimal] : 10.532

obviously this is another good reason to look at making https://github.com/pandastrike/jsck/issues/29 happen! it could just be a fluke. but if it happens consistently and isn't just a fluke, it might be worth looking into, since I'm pretty sure minimal Themis is supposed to be faster than regular Themis.

a few more comments on your PR, https://github.com/pandastrike/jsck/pull/82.

also, did you read this comment?

https://github.com/pandastrike/jsck/issues/81#issuecomment-72273973

I do want to be sure Themis actually validates everything. I asked @Muscula for a gist, because he said he got Themis running the JSON Schema official tests. We have a test harness which should be pretty easy to set up.

https://www.npmjs.com/package/json-schema-tests

ebdrup commented 9 years ago

The bug that was causing themis to not complete in your benchmarks: change resolution scope, changed scope ref invalid

`Expected result: false but validator returned: "Object # has no method '0'"``

Was fixed.

When I instantiate themis with a schema I use a try catch, and fail all the tests with that schema if it throws. So basically like in #82, but with a try-catch.

Be sure to npm update, to get the latest version with the fix.

atrniv commented 9 years ago

@Muscula the reason for the 0 weirdness is because the compiled validator that themis generates is not a validator for a single schema but possibly for multiple schemas which were all compiled together. Its sorta inspired from JSV validator. In this scenario we need some way to specify the id of the schema to validate against. In most actual use cases your schemas would carry an id to identify it some way and then this weirdness wouldn't happen, but in other scenarios we just basically use integers to identify each schema.

However besides this, themis is not supposed to be failing any tests. If you find any bug please do let me know and I'll have it fixed right away.

atrniv commented 9 years ago

@gilesbowkett yes I also got quite a few runs when themis minimal was actually slower than themis default. I even got a run where IMJV was 2 times slower than themis minimal lol. However the benchmarks I use for themis itself uses benchmarkjs and over there the performance is pretty consistent.

You can find it here

gilesbowkett commented 9 years ago

@atrniv those look nice and thorough! I might dig into that code to see how you're using benchmarkjs.

atrniv commented 9 years ago

@gilesbowkett I actually borrowed it from the benchmarks by ZSchema 3 by @zaggino. The credit goes to him. The only thing lacking in it is more practical benchmarks. Right now the only useful benchmarks are the basicObject and the AdvancedObject benchmarks. I'll probably be adding in the benchmarks from jsck into it as well, since they help portray a more rounded perspective of how the validators would perform with actual real world schemas.

gilesbowkett commented 9 years ago

@atrniv thanks! I didn't look at your implementation in detail yet, in fact it could be a week before I can dig into this with any real seriousness. but, fwiw, our benchmarks use five different schemas – two for draft 3 and three for draft 4. combine that with better statistics from benchmark and we could have something pretty comprehensive.

gilesbowkett commented 9 years ago

@Muscula @atrniv so I'm still seeing errors all over the place.

I've tried it a few different ways:

  • validator(data)
  • validator(data, '#');
  • validator(data, '#/');
  • validator(data, '0');
  • validator(data, 0);

obviously I'm missing something about how to work with Themis. help me out here.

(to be clear, this is when running the JSON Schema test suite, and/or validating against a schema with no top-level id.)

gilesbowkett commented 9 years ago

ok, got this working. will now handle https://github.com/pandastrike/jsck/pull/82. may have to create new branch for it, though, commits to master make me OCD.