pyeve / cerberus

Lightweight, extensible data validation library for Python
http://python-cerberus.org
ISC License
3.17k stars 240 forks source link

JSON-Schema comparison #254

Open nicolaiarocci opened 8 years ago

nicolaiarocci commented 8 years ago

It would be nice to add a section to the docs with the main differences between Cerberus and jsonschema, possibly including some use cases where one is a better tool than the other.

I get this asked a lot and unfortunately I do not have a good answer as I don't know jsonschema well enough. When I started work on Ceberus back a few years ago, jsonschema was not out yet or I did not get a hold of it (I suspect the two projects have about the same age).

So, this is up to contributors who know better than me.

funkyfuture commented 8 years ago

phew, i remember when i was looking for a validation library, i had to choose between these two and jsonschema seemed too unpythonic.

just had a quick look at the docs. still the same. certainly a good choice when you have to deal with legacy schemas. how to extend these validators seems to be a well kept secret.

rredkovich commented 8 years ago

Had an experience just few weeks ago with Connexion which provides swagger to flask code skeleton generation and uses jsonschema as validation library.

Faced two things that could be beter:

  1. Partial error messages, e.g. in case of 5 invalid fields report will include only one. After fixing invalid field it will include second, and so on.
  2. In some cases error message are not good for debugging, like "NoneType is not of type str"
txomon commented 8 years ago

This is not a real comparison, but just feedback from my work on this. I created a cerberus to jsonschema conversor for swagger, to have some serializers we internally use be autodocumented with swagger.

I found out some differences that look interesting to this thread.

Observations

Their object definition includes an interesting field called definitions, therefore making the schema self contained and serializable, because all entities in jsonschema live inside the schema, which acts as a namespace, they can have references (instead of circular dependencies). We don't have such need because we can just include one object on itself.

They not only validate, but also document objects, their schemas have titles, descriptions etc.

Big difference that gave me a lot of headaches (and I even thought it was a bug https://github.com/OAI/OpenAPI-Specification/issues/794) is that a field's definition to know if it's required or not is on the parent and not on itself. At first instance it looked like a horrible design decision, but after thinking it for a while, it looks like a good design decision.

They don't support nullable or empty.

Personal opinion

The fact is that most of the world is forcefully using jsonschema.

I think we should be jsonschema compatible, because Cerberus has by far way better feedback messages, and some more features (nullable/empty for example, coercion, defaults, etc.)

Adapt the schema

The incompatibilities with jsonschema:

I would like to change how our schema works to make it have the same structure as json schema, this is, the object definition to be an object definition on itself, not directly the childs of an object.

This would allow us to have the definitions part, and have references. This may not be too compatible at the moment, but I think this would enable us to easily do the second setp

Create importer/exporter

I really like how cerberus is, and the functionalities it provides, however most of the industry is using jsonschema. Because cerberus is almost a superset of the functionalities, once the schema has been adapted for the limitations, there shouldn't be any problem to import/export jsonschema to/from cerberus.

funkyfuture commented 8 years ago

interesting. are you aware of the schema registry? with this a conversion should be possible, shouldn't it?

@nicolaiarocci would you accept a converter in the schema module?

nicolaiarocci commented 8 years ago

@funkyfuture absolutely. @txomon thanks for sharing your experience.

txomon commented 8 years ago

@funkyfuture I wasn't aware of that one!

So, the constraints at the moment are:

The only thing to adapt then is the nullable, empty, required, etc. attributes. How can we reuse a definition if it defines on itself those characteristics?

We should either be capable of overriding from the reference, or change how cerberus schema definitions work and have it defined on the parent (quite breaking change IMO). Any ideas?

@nicolaiarocci I will try to get the work I did at @Ridee into a PR

funkyfuture commented 8 years ago

We wouldn't be able to serialize if there is circular dependency using objects

the 'support' for this in cerberus is gone anyway, afaik.

The only thing to adapt then is the nullable, empty, required, etc. attributes. How can we reuse a definition if it defines on itself those characteristics?

i'm not sure whether i'm getting you right here. but a feature that isn't supported in jsonschema should raise an error when converting to it.

We should either be capable of overriding from the reference, or change how cerberus schema definitions work and have it defined on the parent (quite breaking change IMO). Any ideas?

a conversion may be done from/to a cerberus validator rather than a cerberus schema as you can bind a schema registry to a validator. would that have any downside? edit: in that case i'm not sure that the converter should live in the schema module.

the implementation should solely rely on the stdlib's json module or the feature should only be available if jsonschema is available in the environment. platforms that lack a needed json feature in the stdlib don't need to be supported, imo.

sidenote: if this really achieves 100% compatibility, cerberus could also run against jsonschema reference tests.

oh, this is going off-topic.

txomon commented 8 years ago

the 'support' for this in cerberus is gone anyway, afaik.

Sure, but if someone happens to still have it, it would cause an infinite recursion problem. if you say it's gone, then it's safe to do so...

i'm not sure whether i'm getting you right here. but a feature that isn't supported in jsonschema should raise an error when converting to it.

We can extend it, extensions are defined in jsonschema already, http://json-schema.org/latest/json-schema-core.html#rfc.section.5.4

a conversion may be done from/to a cerberus validator rather than a cerberus schema as you can bind a schema registry to a validator. would that have any downside?

Yeah, because I had to code it externally (without modifying cerberus) my conversor works on the schema directly, but doing it on the validator would be the appropriate thing to do indeed.

Also, the original question on the limitation on these attributes being defined in the child rather than in the parent remains... :/

t2y commented 7 years ago

I'm just looking about what the difference between cerberus and jsonschema is. And then, I found this issue. Adding some documentation into cerberus for this is helpful for a new user who investigates the technical detail.

pavel-shpilev commented 7 years ago

I found this when was looking for comprising between voluptuous (https://github.com/alecthomas/voluptuous) and cerberus. I have only used voluptuous but from brief docs review, cerberus looks pretty similar. Has anybody tried both? A comparison between multiple tools would be great.

funkyfuture commented 7 years ago

another one popped up: https://github.com/gaojiuli/xdata - seems similiar to voluptuous

and one targeting jsonschema: https://github.com/guyskk/validr

we could collaborate on a feature comparison matrix with a spreadsheet on https://ethercalc.net/

funkyfuture commented 7 years ago

i don't know when i will continue with the comparison matrix i started, feel free to check and amend: https://ethercalc.org/y41wgbonovm1

txomon commented 7 years ago

@funkyfuture could you explain a little what those rows mean? I find them not enough explanatory... Specifically the blocking / non-blocking part.

Cheers!

funkyfuture commented 7 years ago

@txomon i amended a legend.

mitar commented 6 years ago

They don't support nullable or empty.

JSON schema has null type: https://spacetelescope.github.io/understanding-json-schema/reference/null.html

So one can say "type": ["string", "null"] and this means that a value can be a string or a null.

Empty can be defined depending on a type. String with max length zero. An object with no properties and no extra properties allowed.

CJ-Wright commented 6 years ago

A description field would be nice.

funkyfuture commented 6 years ago

here's another lib, inspired from Javaland: https://github.com/Grokzen/pykwalify

tyomo4ka commented 6 years ago

I had to make a choice recently between JSON Schema and Cerberus.

Picked Cerberus. Here is my considerations:

  1. Super easy to extend with your custom rules.

  2. It offers normalisations rules: http://docs.python-cerberus.org/en/stable/normalization-rules.html. I didn't find how to implement something like this in JSONSchema.

  3. Cerberus to Open API conversion could be easily done. I spent less than a day on writing converter, it's specific to my use case, didn't have time to package for open source, but I will try later.

  4. Description and any other "meta" fields can be easily done with this trick:

    def _validate_description(self, description):
        """
        Allows a description field on the structure

        The rule's arguments are validated against this schema:
        {'type': 'string'}
        """
        pass
  1. I had to deal with many isolated schemas rather than with a one big schema, where "definitions" feature would be useful.
primordialstew commented 6 years ago

Thank you for this thread! It raises a question I immediately had when I came across Cerberus, which was: "This tool looks awesome! I wonder why it is 'rolling its own' schema specification, instead of using JSON-Schema?"

Folks are raising a lot of interesting points and questions, but there seems to be some conflation/comparison of tools with importantly different scopes and responsibilities.

Maybe it would be helpful to clarify some of these scopes. My understanding is as follows:

  1. JSON-Schema is a schema specification, not an implementation.
  2. jsonschema is one of several libraries that attempt to implement JSON-Schema
  3. Cerberus is also a validation library, but it defines/implements its own schema specification.

So Cerberus defines its own schema specification. For the sake of this discussion, let's give it an arbitrary name, Cerb-Schema. Comparing Cerberus to jsonschema is an apples-to-apples comparison; both are validation libraries. Comparing Cerb-Schema to JSON-Schema is an apples-to-apples comparison; both are schema specifications. Comparing Cerberus to JSON-Schema is an apples-to-oranges comparison; Cerberus is a library, JSON-Schema is a spec.

Okay, with luck that is clear and not too controversial?

If so, then I think a key question is: can Cerberus be adapted to support an alternate schema specification, namely JSON-Schema. That would mean that projects with existing data definitions written to the JSON-Schema spec could use Cerberus as a validation library without having to create a duplicate definition in "Cerb-Schema". This is a different question than the OP, which is "can someone create a section in the documentation that compares the Cerberus and jsonschema validation libraries", but addresses some of the discussion that emerged in the thread. Perhaps it would be valuable to create a new thread: "Add support for JSON-Schema schema definitions"?

primordialstew commented 6 years ago

P.S. there is some mention of Swagger/OpenAPI. This is a really interesting and useful tool, but it's scope is to provide a specification and a reference implementation for defining RESTful web applications. It constitutes a particular use-case for schema specifications, data definitions and validation, but is neither an apples-to-apples comparison to validation libraries (Cerberus, jsonschema) nor schema specifications ("Cerb-Schema", JSON-Schema).

CMCDragonkai commented 5 years ago

Would be nice to be interop with JSON schema. This would make the same schema portable between different environments that perform validation.

jim-bo commented 5 years ago

I've worked with cerberus for a number of years but have recently been required to use json-schema to define the data model (for increase compatibility with other projects). The python validator of json-schema is pretty basic, a port of the cerberus validator logic (including things like coercion and nullable) to support json-schema would be grand.

ssbarnea commented 4 years ago

Any updates on this? I am really interested about exporting a JSONSchema definition our of cerberus one because that format became kinda standard with hundreads of adopters and tools already using the schemastore.

Jerry-Ma commented 4 years ago

I came across this thread and it seems that no one has mentioned this: https://github.com/keleshev/schema . This package provides a very pythonic way of doing schema validation, and it provides a converter to jsonschema. I was wondering what are the differences between Cerberus and this schema package.

macks22 commented 3 years ago

Another package that seems quite relevant to this conversation is pydantic.

It provides what I perceive to be an extremely Pythonic validator for JSON Schema and object mapping between JSON Schema and Python BaseModel classes. These are quite similar in principle to dataclasses and the package actually provides a simple way to convert existing dataclasses to its object model. It also provides a simple interface for defining custom validation logic in Python and a variety of integrations with other tools, e.g. mypy, PyCharm IDE, ORMs.

So +1 to the idea of being able to go back and forth between Cerberus schemas and JSON Schema, since that would then make it possible to easily convert from Cerberus to pydantic and vice-versa.

One other note: one of the big design principles in pydantic appears to be an emphasis on speed; thus it provides a benchmark comparison to other validators, including cerberus. This is only a comparison in terms of speed though; other considerations like usability, comprehensiveness of features, dev team responsiveness and project longevity, etc. are not addressed there.

ssbarnea commented 3 years ago

@macks22 That came a very good time as I need to build a simple JSON Schema to enforce a file format but I still have PTSD from one year ago when I did some work on it. The JSON schema format seems to be anything but human and gives me the impression that only the spec authors are able to write schemas in it.

If you can give some hints on how to produce a JSON schema from anything else (preferably python), it would be awesome. I still need to produce JSON schema because that is what is the official sharable format, and supported by many tools and editors (vscode). The best it would be if I could use python 3.6 typing support to define the data types.