json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.79k stars 266 forks source link

Investigate CBOR compatibility #6

Open awwright opened 9 years ago

awwright commented 9 years ago

CBOR (RFC7049, http://cbor.io/) is considered a binary version of JSON, however it implements a superset of functionality, including native dates, byte (octet) strings (JSON is UTF), integers, URIs, and different storage formats for floating point and fixed and variable sized integers.

For draft-5, we need not add features specific to CBOR, but consider the ways that CBOR might be used, and make sure there's no definitions in outright opposition to this goal.

yoshuawuyts commented 9 years ago

This sounds reasonable.

handrews commented 8 years ago

As I understand it, there would be only a few things involved in basic "support" for CBOR

  1. An application/schema+cbor media type
  2. Declare that CBOR support at this point is limited to aspects of CBOR that are strictly compatible with JSON. Basically, a schema+cbor instance is just a schema+json instance re-encoded as CBOR.

It would be good to have even that very basic support in draft 05 so that people are encouraged to start playing with the concept.

The media type is particularly important to claim- if you're building a system that relies on standardized media types, you need one declared so that you can reasonably use it.

epoberezkin commented 7 years ago

I don't understand what is the value. If CBOR is only used just an alternative way to encode JSON, why does it need to be mentioned in spec in any way? Why not YAML then? Anything that maps to JSON can be used instead of it, why bother mentioning it in the spec?

awwright commented 7 years ago

YAML doesn't have a formal standard that we're able to reference - we'd have to reference the webpage, but I think if I picked one over another, CBOR has a well defined mapping to JSON, and it also demonstrates diversity a bit better because it's binary.

epoberezkin commented 7 years ago

@awwright you are missing the point. I was just using YAML as equally non-sensical example. Why include in the standard the things that have nothing to do with it?

awwright commented 7 years ago

As it relates to this issue, it helps avoid a monoculture -- JSON is a lot better than other media types for a lot of reasons, but other file formats might be better for different use cases.

Some might support comments, or be more human readable in general. Some might be more compact or faster to parse, like a binary representation. Some might be very more compact -- The EXI people have been in contact and interested in applying JSON Schema to JSON, where EXI is normally only used for XML.

So there's a variety of reasons you might want to support an alternate form of JSON-encodable data.

Why reference CBOR in particular? Because I think it's appropriate to issue an example that shows the breadth of what is supported, and CBOR is a standardized media type explicitly similar to JSON, that targets a different audience with a binary encoding.

epoberezkin commented 7 years ago

@awwright I completely understand the desire to use other formats to represent JSON data. I disagree with the need to include it in JSON schema spec. JSON schema is JSON data. Users can use any format they wish that maps to JSON. It's a much more general question that needs to be discussed in this spec. Why not keep it simple? Everybody seems to like simple ...

epoberezkin commented 7 years ago

To clarify: is there any aspect of using CBOR etc. to represent JSON schema that make it different from using it to represent any other JSON data? If there is, then it belongs to this spec. If there isn't, it would just litter the spec with trivial and general observations.

handrews commented 7 years ago

@epoberezkin I think the change is to indicate the applicability of JSON Schema to media types other than JSON by citing the closest related interesting media type rather than encoding JSON Schema in CBOR (which needs no explanation).

I work with/have worked with teams that are very sensitive to performance constraints. Most of the time when I get people to actually measure JSON "overhead", it turns out to be insignificant. But occasionally it is a significant factor, or the environment is so constrained that nearly any improvement is significant (which is basically what CBOR was designed for).

I generally try to push people towards CBOR rather than protobuf or other RPC-oriented serialization formats, and I usually have to show a lot of evidence of it as a broadly accepted media type for JSON-compatible binary environments.

So for me, the presence of CBOR in the JSON Schema spec strengthens my hand when making that case, so this change is on that I consider very valuable. And it's a tiny statement that costs us basically nothing.

epoberezkin commented 7 years ago

It increases the word count without changing anything. These applicability statements belong to separate publications rather than to the spec, because there are hundreds of other use cases and because they don't change anything from the spec perspective. But I'll leave this argument.

akuckartz commented 7 years ago

You could produce a brief separate document regarding CBOR for those who might be interested in it.

awwright commented 7 years ago

@epoberezkin Most implementations don't even parse JSON documents, it parses a structure in memory. The paragraph that was added doesn't change any implementation, but it does help clarify what was already true, and informs people that this is practiced.

The CBOR part is one part of one sentence plus a non-normative reference. If it was a normative reference your point would be totally valid. But a minimally short, non-normative reference is fine here.

This issue is also dealing with a larger issue of: should it be possible to add CBOR-specific formats and other support. I'm leaning strongly in the direction of "not in the document" (though I'd like to see what an implementation looks like).

epoberezkin commented 7 years ago

The CBOR part is one part of one sentence plus a non-normative reference. If it was a normative reference your point would be totally valid. But a minimally short, non-normative reference is fine here.

Ok. If you say so then it's fine :)

Relequestual commented 7 years ago

I've removed this issue form the draft-future milestone because I don't feel anyone has clearly defined what a resolution to this issue would look like. I think further discussion is need to clarify what the problem is, and how we would know it's resolved.

@handrews thanks for making it clear you see a use case for this. Even with that use case, I don't see that this issue should result in an addition to the spec document. If it should be in a seperate document, I don't know what that would be called or what it would look like.

handrews commented 7 years ago

@Relequestual the only thing I can think of here would be that at some point perhaps we want to define application/schema+cbor as essentially a CBOR encoding of an application/schema+json document. Which is a bit different than what I was talking about before.

Since CBOR is a superset, there may be a few things we would need to nail down beyond just saying "encode your JSON Schema in CBOR". I don't know whether that belongs as part of JSON Schema core (which would then define both JSON Schema and CBOR Schema), or whether it should be a separate spec or even separate project altogether.

mkovatsc commented 7 years ago

@handrews I guess you mean application/schema+cbor and application/schema+json :)

mkovatsc commented 7 years ago

Another thing to consider when using CBOR is registering Tags: https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml . This will make it more compact.

A further step could be to register (e.g., with IANA) numeric identifiers for your string-based enums/constants.

awwright commented 7 years ago

To reiterate, the purpose of this topic is to contemplate using JSON Schema to describe a CBOR document (not the other way around).

I suppose some applications of JSON might be in constrained environments where a CBOR JSON Schema might be useful, but I'd like to see a specific use case first.

mkovatsc commented 7 years ago

One specific use case: OCF uses JSON Schema to define its interfaces but uses CBOR on the wire.

handrews commented 7 years ago

@mkovatsc thanks for the catch, I updated the comment

@awwright perhaps we should have a separate issue for encoding JSON Schema in CBOR? This seemed both important and reasonable at the WoT conference.

@mkovatsc could you file it as a separate issue and mention that use case? It's good to get things filed from people other than the usual suspects here :-)

mkovatsc commented 7 years ago

I created #259 for concice encodings of JSON Schema, e.g., CBOR. The use case I mentioned is for this issue here: JSON Schema used to describe CBOR documents.

awwright commented 4 years ago

Now that JSON Schema is a bit more mature with vocabularies, I think I can narrow our considerations down to two things:

(1) Some parts of CBOR are cosmetic. For example, like how JSON allows you to represent the same number multiple ways (4000 or 4e3), CBOR lets you encode numbers in different ways as well.

JSON Schema could consider these purely cosmetic differences, and not enforce a difference between them. Most values in CBOR can be represented in JSON this way; some exceptions exist:

We would have to decide if implementations are allowed to accept a superset of JSON values for a "type" keyword, or if you must use a new keyword (since broadening the range of a type could cause problems—applications expect "number" to be real, and exclude NaN/inf).

(2) For the cases where the distinction really does matter for some strange reason, for example, a value must use a specific tag; we can build a $vocabulary and/or meta-schema that describes the requirement.

Also, https://tools.ietf.org/html/rfc8610 CDDL is now an RFC.

jtbandes commented 1 year ago

Are there any common best practices today for representing CBOR-specific types (such as binary data) in JSON Schema?

gregsdennis commented 1 year ago

@jtbandes at the moment, I think the best you have is using the content* keywords, which will produce annotations. You'd then need to read those annotations and deal with them in your application.

jtbandes commented 1 year ago

Yes, we've considered that. However, the spec states that the content* keywords are for data encoded as strings, which CBOR binary is not.

gregsdennis commented 1 year ago

I imagine if you have CBOR data, and you can get it into the JSON data model, any existing validator could handle it. (You don't need to translate the CBOR binary to JSON text, just get it into the data model in memory.) Outside of that, I don't see it being done without some special handling.

gregsdennis commented 1 year ago

(But CBOR is a bit out of my knowledge space. I'm just reading up on it.)

jtbandes commented 1 year ago

I imagine if you have CBOR data, and you can get it into the JSON data model...

We don't need to get it into the JSON data model, really. The context is a web application that can parse and visualize data in many different encodings, of which JSON is one, but also Protobuf, FlatBuffers, CDR, and soon CBOR. Because JavaScript is such a dynamic language, a schema is not strictly required for all data, but the application has several features which are made available when schemas are known in advance of the data itself. (Currently we do not perform explicit validation using the schemas, but we assume the decoded data conforms to the given schema. In the case of JSON data with JSON Schema, we can also use the schema to know that we should treat a string with contentEncoding: base64 as a binary buffer.) We support schema representations appropriate to each encoding, and are trying to determine if there is a schema description format that's appropriate for CBOR. JSON Schema seems like a good candidate except that it doesn't have types to represent some of CBOR's features.

gregsdennis commented 1 year ago

You can always write a vocabulary with new keywords to support what isn't already. For example, you could have a cborType keyword that gives you what you need. You'd need to be working with an implementation that supports vocabs (or custom keywords at a minimum, but preferrably $vocabulary).

awwright commented 1 year ago

I think this can be closed out by #1390 if we simply specify that a non-JSON format can be validated against a schema, if it defines an instance equality function that maps the format's non-JSON values to JSON values.

Note that once non-JSON formats get involved e.g. CBOR, sometimes instances that are not considered equal in CBOR will be considered equal in JSON instance equality, and this may be counter-intuitive (so think of it less like "equals" and more like "is distinguishable").