json-schema-org / json-schema-spec

The JSON Schema specification

http://json-schema.org/

Other

3.76k stars 264 forks source link

"$id" as an indicator of embedded documents #719

Closed handrews closed 5 years ago

handrews commented 5 years ago

Idea from @jdesrosiers on slack (with minor tweaks from me, and probably a misinterpretation or two, but this is at least good enough to record the general concept):

Instead of discussing $id as primarily assigning URIs to schema objects, shift the focus to schema documents. For reasons that will be apparent later, also say that $id URI references MUST NOT contain a fragment.

The key idea is that schema documents can be embedded in other schema documents.

$id is used to indicate an embedded document, and the schema object containing that $id is considered to be the root schema of the embedded document. Whether it is standalone or embedded, a schema document's base URI is the value of $id in its root schema.

An embedded document's $id can be a relative URI reference, in which case it is resolved against the base URI of the containing schema document.

The contents of embedded documents cannot be referenced with a JSON Pointer fragment attached to the containing document's base URI:

{
    "$id": "https://example.com/outer",
    "additionalProperties": {
        "$id": "inner" {
            "items": {...}
        }
    }
}

In this example, the schema that is the value of "items" can be referenced as https://example.com/inner#/items. It cannot be referenced as https://example.com/outer#/items, which is a change from the current behavior.

The reason for this may be more intuitive when considering this, functionally identical schema:

{
    "$id": "https://example.com/outer",
    "additionalProperties": {"$ref": "inner"}
}

This is essentially the same schema, but with "inner" included by reference rather than directly embedded. In this example, it is clear that https://example.com/outer#/items is meaningless. There is no such location. Embedding the schema does not make that URI meaningful; essentially, JSON Pointer fragment evaluation cannot cross into an embedded document.

In this approach, $id is always indicating the base URI of a document. As fragments are stripped from base URIs, it does not make sense to allow fragments in $id when used this way. In fact, RFC 3986 section 6 states:

Some protocol elements allow only the absolute form of a URI without a fragment identifier. For example, defining a base URI for later use by relative references calls for an absolute-URI syntax rule that does not allow a fragment.

Therefore, the only time fragments make sense in $id is in the plain name fragment definition form: {"$id": "#foo"}.

This form does not work when considering that $id indicates an embedded document. Because fragments are removed from a URI before it is used as a base, the base URI of such an embedded document would be identical to that of its containing schema document. This is obviously an incorrect usage of URIs, and should not be allowed.

While all of the behavior of $id as specified in draft-07 is simply a result of applying RFC 3986 rules to the hierarchical schema structure, most users seem to view the fragment definition form and the base URI change form as separate features. Since this form is not compatible with $id as an embedded document identifier, and many users view it as a different feature anyway, let's drop this form.

In its place, the "$anchor" keyword defines plain name fragments. Note that its value is simply the plain name, without the # fragment: {"$anchor": "foo"} is the equivalent of the former {"$id": "#foo"}.

This also allows for

{"$id": "https://example.com/foo", "$anchor": "bar"}

to replace

{"$id": "https://example.com/foo#bar"}

which, as far as I can tell, is currently a valid use of $id, although apparently I put a CREF in draft-07 wondering whether or how it should actually work. But if we split that function off into an $anchor keyword and outright forbid fragments in $id, this is no longer a weird corner case. Each keyword functions separately and unambiguously.

jdesrosiers commented 5 years ago

@handrews thanks for writing this up!

This is a bit different than what I had in mind, but the important bits are there. There's one problem however. $ids need to support fragments because $ref supports fragments and these are equivalent concepts. For example, the following are equivalent.

{
  "type": "object",
  "properties": {
    "aaa": { "$ref": "/common#/definitions/twentieth-century-year" }
  }
}

{
  "type": "object",
  "properties": {
    "aaa": {
      "$id": "/common#/definitions/twentieth-century-year",
      "definitions": {
        "year": { "type": "string", "pattern": "^\\d{4}$" },
        "twentieth-century-year": { "$ref": "#/definitions/year", "pattern": "^19" }
      }
    }
  }
}

In the second example, the document (/common) needs the fragment (/definitions/twentieth-century-year) to know that the "value" of this embedded document is only part of the whole document.

jdesrosiers commented 5 years ago

This also allows for

{"$id": "https://example.com/foo", "$anchor": "bar"}

to replace

{"$id": "https://example.com/foo#bar"}

Sort of. Having $id and $anchor next to each other is not wrong, but it's also not very useful. Here's an example.

{
  "type": "object",
  "properties": {
    "name": {
      "$id": "/person-name",
      "$anchor": "name",
      "type": "string"
    },
    "age": {
      "$anchor": "age",
      "type": "number"
    }
  }
}

#/properties/name == /person-name == /person-name#name The $anchor doesn't really do anyone any good. It identifies the same thing, but requires more characters.

#/properties/name != #name The $anchor is part of the embedded document and thus not visible to the parent document.

#/properties/age == #age The "age" $anchor shows how $anchor is useful. $anchor does not create a document boundary, it's just a marker for a location in the document.

handrews commented 5 years ago

@jdesrosiers thanks! I'll start with the $anchor bit as that's more straightforward- I need to think more on your other comment.

I agree with you that there's not much point to the use case. I brought it up for two reasons:

Primarily, since it's syntactically possible, it should have well-defined behavior unless there's a really good reason to forbid it (icky) or say that it is undefined (not necessary as far as I can tell).
I was actually intending to emphasize the document boundary, not imply that $anchor either creates or is visible across such boundaries.'

The relevant behavior can be demonstrated with:

{
    "$id": "/person-name",
    "$anchor": "name",
    "type": "string",
    "additionalProperties": {
        "$ref": "#name"
    }
}

which is identified by all of: /person-name, /person-name#, /person-name#name

The use of #name under additionalProperties works. All of this functions identically in the embedded case. And likewise, $refs outside of the embedded document cannot use #name (unless there is also an "$anchor": "name" in that containing document, in which case that is the one it references).

A $ref in the containing document could use /person-name#name, which again would behave the same whether the /person-name schema is embedded or not.

As for why someone might do that... I can't construct a case where I would, but it doesn't hurt anything. More importantly, it's almost certainly harder to carve out an exception for $anchor alongside of $id than to just let it have the obvious behavior even if said behavior is not very useful.

Lots of things in JSON Schema are not useful (e.g. {"minLength": 10, "maxLength": 1}), and we allow those things because it's much easier to allow them than to detect and prevent them.

So I don't see any good reason for preventing it. It's straightforward behavior.

jdesrosiers commented 5 years ago

That $ref could also be "#" :smile:

I agree that there is no reason to make this illegal. It causes no conflicts or contradictions. It's just not useful. It's something for a linter to call your attention to. Nothing more.

handrews commented 5 years ago

@jdesrosiers

OK taking a closer look at this example:

{
  "type": "object",
  "properties": {
    "aaa": {
      "$id": "/common#/definitions/twentieth-century-year",
      "definitions": {
        "year": { "type": "string", "pattern": "^\\d{4}$" },
        "twentieth-century-year": { "$ref": "#/definitions/year", "pattern": "^19" }
      }
    }
  }
}

It looks like rather than the $id fragment being part of the identification of the document containing the $id, you are indicating that the embedded document's URI is the base URI from $id, and you are applying the fragment to that base URI to result in the effective value of the document within the embedding context being { "$ref": "#/definitions/year", "pattern": "^19" }.

Is that correct?

If so, while I understand your desire for $ref/$id functional equivalence, and see how the application of the fragment works the same way for both, that is very surprising behavior for me from the perspective of $id being an identifier. In this approach, I guess only the base URI part of it is an identifier, with the fragment being a selector.

I'll stop here to wait for a reply in case I am way off base.

jdesrosiers commented 5 years ago

Yes, that's correct. And, yes, it's not going to be immediately clear to a lay person what's going on here. But, all you have to understand is the concepts of document and document-value and you understand how everything works. From the schema to $ref to $id, they are all the same concept with no exceptions.

However, I'm not too worried about how human readable this is because embedding documents should be the domain of tooling. Schema authors wouldn't write something like that by hand. It's easy for programs to understand and that's the important thing for this feature.

Furthermore, in the use-cases where schema authors would want to use $id manually, fragments are not needed. Allowing fragments is backwards compatible with how $id is used today. All it does is simplify the conceptual model and enable more powerful tooling.

handrews commented 5 years ago

@jdesrosiers OK so I see where you're coming from with this. However, I don't see this approach as directly viable with JSON Schema. But we may be able to end up with compatible behavior.

Regarding:

Allowing fragments is backwards compatible with how $id is used today. All it does is simplify the conceptual model and enable more powerful tooling.

Well... it's syntactically backwards compatible. And it's technically semantically backwards compatible because it relies on a syntax with explicitly undefined semantics in recent drafts. But, technically, terminating the process, printing a zen koan to stdout, or playing Beethoven's 9th over the speakers are all equally technically compatible 😛

I do believe that it violates the spirit of the semantics of $id, which is clearly "this URI identifies this schema object". Using the fragment as a selector directly contradicts that, even if you can get the base URI stuff to work out for resolution purposes within the embedded document.

And it absolutely violates the way most implementations would treat this, which is to ignore the fragment, or maybe use it to calculate JSON Pointer fragments for subschemas, which is one possible interpretation although not directly supported by either RFC 3986 or RFC 6901.

I'm not aware of anyone that interprets a JSON Pointer fragment in $id as a selector (although I would be interested if such an implementation other than yours exists).

Put this together with the fact that JSON Schema already deviates from a generic $ref/$id system by restricting $ref (as a keyword) to schema objects, I am pretty comfortable putting restrictions on $id compared to that system as well.

So, if (in JSON Schema), we completely forbid fragments in $id, but do take on the document boundary behavior and the embedded document model, I think we are compatible in this sense:

any embedded document can be replaced with a schema object consisting solely of $ref with the same value as the embedded $id

Note that this only goes in one direction. It is not the case that every $ref can be replaced with an embedded document, as only complete documents (identified by URIs without fragments) can be embedded. And for that matter, if other keywords exist adjacent to $ref, then a bit of extra schema manipulation is needed to get it to work. Although for practical purposes, that does work.

AFAICT, this means that the behavior that would be allowed in JSON Schema is essentially identical to what your system supports. But there would be substantial behavior in your system that is not allowed in JSON Schema.

Does this make sense? Does it seem like a reasonable way forward?

handrews commented 5 years ago

To summarize what I see as the benefits:

For any given schema object, there is at most one valid declared (with $id) base URI. So there is only one URI using a declared base plus a JSON Pointer fragment
$id would conform with the guidance in https://tools.ietf.org/html/rfc3986#section-4.3 that: For example, defining a base URI for later use by relative references calls for an absolute-URI syntax rule that does not allow a fragment.
Explaining $id in terms of embedded documents is finally straightforward:
- Each schema document, embedded or otherwise, has a URI
- $id defines that URI
- That URI is the base URI for the document, period
Having the syntax "$anchor": "foo" to create fragment #foo is analogous to HTML fragment declaration with <div id="foo">, in that the identifier declaration is not itself a URI reference and therefore does not include the # character.

@epoberezkin you put a frowny face on this. Do you have any clear feedback on the idea? Otherwise, I am disregarding vague unspecified disapproval.

@Julian @gregsdennis @johandorland @erosb @KayEss input from implementors would be most welcome on this, given that $id has always been a pain point.

Based on my past experience implementing $id/$ref support, this would simplify things substantially, but I have no idea if that applies to existing production implementations.

KayEss commented 5 years ago

It sounds positive to me.

Right now I have to special case the use of a fragment in $id, so I think with this proposal that would be an error.
I hope we can say that a $anchor may not be a JSON pointer, so it must not start with a /.

Explaining $id in terms of embedded documents is finally straightforward

This bit I'm not so sure I understand. The proposal isn't that $id acts like base in HTML where each $id appearance resets the base for the whole document? I assume it continues to work as it does now, where the $id is for that part of the JSON schema? I assume we still allow an $id to be (path/protocol etc.) relative to the current outer one?

jdesrosiers commented 5 years ago

I do believe that it violates the spirit of the semantics of $id, which is clearly "this URI identifies this schema object". Using the fragment as a selector directly contradicts that.

There's no contradiction. There are two concepts: document and value. The fragment part of a URI does not identify a document. http://example.com/foo identifies the same document as http://example.com/foo#a identifies the same document as http://example.com/foo#b. It's just, the values of these documents might be different.

or maybe use [the fragment] to calculate JSON Pointer fragments for subschemas, which is one possible interpretation although not directly supported by either RFC 3986 or RFC 6901.

How a fragment is interpreted is defined by the media type and we define the media type, so no problem there. HTML does the same, so we'd be in good company. This isn't even new behavior for JSON Schema. This behavior is already defined for $ref, all we have to do is not exempt $id from this behavior.

I'm not aware of anyone that interprets a JSON Pointer fragment in $id as a selector

Huh? I wouldn't expect anyone to have implemented this change before it was proposed.

any embedded document can be replaced with a schema object consisting solely of $ref with the same value as the embedded $id. Note that this only goes in one direction. It is not the case that every $ref can be replaced with an embedded document

That's too bad. Embedding a document is far more useful than extracting an embedded document.

AFAICT, this means that the behavior that would be allowed in JSON Schema is essentially identical to what your system supports. But there would be substantial behavior in your system that is not allowed in JSON Schema.

I think that's accurate. However, I see no benefit of the no fragment constraint. It makes things more complicated and less powerful.

(This section is a bit of a tangent. If we want to discuss this further, I suggest we take it to slack)

JSON Schema already deviates from a generic $ref/$id system by restricting $ref (as a keyword) to schema objects

$ref was restricted to schemas mainly because no implementations supported anything else. The conceptual model I propose makes it easy to support $ref anywhere. In fact, it's harder to restrict it. It might be worth revisiting that decision at some point. It wouldn't be very useful in JSON Schema validation, but some third-party vocabularies might find it useful. Also, JSON Reference is useful as a media type in it's own right and I think it's a shame that it's now coupled to JSON Schema.

jdesrosiers commented 5 years ago

@KayEss Thanks for your feedback.

2. I hope we can say that a $anchor may not be a JSON pointer, so it must not start with a /.

Good point! I hadn't thought of that.

The proposal isn't that $id acts like base in HTML where each $id appearance resets the base for the whole document?

No, $ref is like an iframe in HTML and $id is like an iframe whose document has been prefetched and embedded in the parent document. I think of it like a variation of the HTTP/2 push feature.

I assume it continues to work as it does now, where the $id is for that part of the JSON schema?

Very little of how it works would change. It's more a change in the way we think about it. Any part of the JSON Schema with an $id should be considered to be a $ref that has been inlined and a completely separate document that is parsed as it's own independent schema. So, it doesn't make sense to think of $id referring to part of a schema. $id is always only at the root of the document. It's just that that document might be embedded inside another document. I hope that made at least a little sense.

I assume we still allow an $id to be (path/protocol etc.) relative to the current outer one?

Yes. I see no reason to change this.

jdesrosiers commented 5 years ago

I realized that we keep talking about my proposal vs the slightly modified proposal @handrews has written up here. So, for those who missed it on Slack, here is how I introduced it.

I've been working on a generic browser concept based on JSON Reference. It's still very early stages, but I think the model I came up with is a candidate for a solution to the issues JSON Schema has with $id. I've been meaning to share this for a while, but have been hesitating due to uncertainty about how to present it. I've finally decided that getting something out there is better than nothing, so I'm presenting this brief overview and I'll let any questions drive any discussion.

I've found this model easy and efficient to implement. It has strong parallels to existing web constructs. It simplifies the concepts without loosing anything of value.

One of the goals of this model is to fully decouple JSON Pointer, JSON Reference, and JSON Schema. Each can be implemented independently of one another. I wrote a JSON Schema-ish validation proof of concept that builds on JSON Reference (rather than JSON). This implementation has full support for $ref/$id without dedicating a single line of code to supporting it.

JSON Reference for JSON Schema Implementors

The features of JSON Reference are very similar to the features of $ref and $id in JSON Schema. However, the concepts are slightly different and the keywords are slightly more constrained (in a good way) than their JSON Schema counterparts.

Documents vs Values

All JSON Reference documents have a "value". The fragment part of the document's URI identifies the portion of the document that is considered the "value" of the document.

If the fragment is empty, the value of the document is the whole document.

If the fragment starts with a / the fragment is interpreted as a JSON Pointer and the value of the document is the document resolved against the JSON Pointer.

If the fragment is not a JSON Pointer, then it's an anchor fragment. The $anchor keyword provides a label that marks a portion of the document. Given an anchor fragment, the value of the document is the portion of the document identified by an $anchor keyword that matches the anchor fragment.

The value of a document whose URI fragment does not point to a valid part of the document is undefined. Implementations must not cross document boundaries in attempt to resolve a fragment.

`$ref` indicates an embedded document

$ref indicates a document to be embedded. It's analogous to an <iframe/> in an HTML document. Even $refs that point to the current document are embedded documents. Notice that the entire document is embedded, not just the value of the document. However, a user agent that encounters an embedded document should use the value of the document. It's necessary to embed the entire document in order to properly handle any $ref within the embedded document.

`$id` is an embedded `$ref`

An $id indicates an inlined $ref. This is similar to using the HTTP/2 push feature to send the document identified by the src attribute of an <iframe>. It's just a network optimization for a $ref. This means that unlike JSON Schema, an $id can have a fragment and that fragment is meaningful.

`$anchor` is not an embedded document

$anchor provides a way to have a path-independent way to identify a document's value without creating a document boundary.

handrews commented 5 years ago

@KayEss my intention was that $anchor only defines plain name fragments. "plain name fragment" is actually a formal term, used in several media types. It comes from the W3C's fragment identifier best practices document.

The exact syntax is specified in section 8.2.3 of the core spec. The meta-schema for $anchor would be something like:

{
    "type": "string",
    "pattern": "^[A-Z][A-Za-z0-9_.:-]*$"
}

Which is not I18N friendly and we should perhaps address that, but that's what we would get from the currently defined syntax.

handrews commented 5 years ago

@jdesrosiers

I'm not aware of anyone that interprets a JSON Pointer fragment in $id as a selector

Huh? I wouldn't expect anyone to have implemented this change before it was proposed.

Right. You were, I thought, claiming compatibility with the current $id in terms of the fragment syntax. It is definitely not compatible.

This behavior is already defined for $ref, all we have to do is not exempt $id from this behavior.

This does not hold up. $ref is an element to induce navigation, like <a>. $id is an element to identify things. It does not involve navigation, and never has.

One of the goals of this model is to fully decouple JSON Pointer, JSON Reference, and JSON Schema.

That is not my goal, and I do not think it is a good goal for JSON Schema. I think it's a great separate project, hence my effort to retain compatibility with it. But it makes schemas much more difficult to reason about.

My goal is to simplify $id in the context of JSON Schema.

However, I see no benefit of the no fragment constraint. It makes things more complicated and less powerful.

I strongly feel that it is the exact opposite. @KayEss noting that fragments are a special case in their implementation would seem to support that.

I don't want this to be another big back and forth between these two proposals. This project has seen enough of that. But I also want people to be able to find this more easily than by digging through slack history.

@jdesrosiers I suggest that we close this, and i'll re-file mine, and you can re-file yours. Each can be discussed on its own merits, and if either garners sufficient support, we'll add it to the spec (or create a separate spec, I suppose).

epoberezkin commented 5 years ago

Firstly, it would be good to screen this idea against what problem we are trying to solve here? I couldn’t understand it. Once we understand the problem, we can see whether it can be solved with the existing vocabulary or any extension is needed.

Secondly, I am not sure why $anchor is needed if you already can use $id as an anchor, according to the current spec.

handrews commented 5 years ago

@epoberezkin the problem is that people constantly complain about how complicated $id is, and don't understand it or how it works.

If you don't agree with that problem, that's fine, I'm not interested in convincing you.

epoberezkin commented 5 years ago

Right. I thought this proposal is about actually making $id more complex... Maybe the fact that there are two proposals is confusing.

epoberezkin commented 5 years ago

If you don't agree with that problem, that's fine, I'm not interested in convincing you.

@handrews I wrote that I don’t understand the problem. I may agree or disagree with the solution, but if anybody has a problem it is a fact. Whether the problem should be solved is another matter entirely.

There is a beautiful post by @kellan on screening new tech - it definitely applies to how all solution ideas should be screened to avoid feature bloat:

https://kellanem.com/notes/new-tech

handrews commented 5 years ago

Right. I thought this proposal is about actually making $id more complex... Maybe the fact that there are two proposals is confusing.

Yeah, that's why I figure we should close this and re-file separately.

To give a less snarky answer to your prior question, I'm pretty sure @jdesrosiers and I are attempting to solve different problems. So the solutions don't compare well anyway.

If you (and others) don't think the problem I see is a real problem, then obviously the solution is not compelling and we won't do it. I'm not really interested in selling anyone on the problem- you either see it or you don't, and that part is more interesting to me than convincing people how to look at it.

epoberezkin commented 5 years ago

@handrews honestly, all I am saying that it would be good to understand the problem. All problems are real, it’s not for me to judge. If and how they should be solved is another question. I cannot reason about the solution if I don’t understand the problem. You may have discussed it on slack but I do not see it in this ticket.

handrews commented 5 years ago

@epoberezkin I am (perhaps surprisingly) not trying to be difficult here. To me, the problem is blatantly obvious. If it is not blatantly obvious to anyone else, that is interesting. If pretty much everyone who comes across this is like "why bother?" then that's all I need to know, really.

handrews commented 5 years ago

I am also just burnt out. I saw a solution to $id difficulties, when I've been spending an inordinate amount of time responding to people who think $id is a total confusing mess. I thought "hey, this is totally simple and makes everything easier." If I'm going to have to convince everyone of the problem and the solution... f*** it. I'm out of gas. I'll wrap up what I have and shove it out the door as I had been intending.

It's too demoralizing to fight this.

johandorland commented 5 years ago

I've had my fair share of struggles with $id in the past. Now that I have a good understanding of it how it currently works I don't mind it as much.

I don't fully grasp all changes conceptually just by reading this issue in a few minutes, but as I currently understand them in my own words the proposal is to:

Remove shadowing of base URIs. A document will have one base URI and that's it.
Split location independent identifiers into a new $anchor keyword.

I like the removal of shadowing. I don't think many people use it in practice anyway, but having it makes parsing harder. I'm not particularly fond of adding $anchor as I don't see the added benefit. I'd rather have a slightly more complicated $id than having an extra keyword that still interacts with $id.

My current problem with $id is that it is hard to implement as it basically requires a two pass parser or the complicated lazy loading of $refs. Implementations would really be helped if the scope of where $refs can point to would be restricted. Currently you can't just look up the place $ref points to because the accompanying $id can be anywhere. If I had to put forward a suggestion to simplify $id I would:

Limit $id to schemas in definitions(/$defs) blocks (and also the root schema and definition blocks inside schemas that are in a definitions block, etc). In that way an implementation could just parse definitions as the first keyword and know for certain that any $ref that is encountered will be to an $id that it has already parsed and therefore greatly simplifying things. In practice schemas will almost always be structured in such a way, so it's not as much of a breaking change. Currently it's quite hard to explain to implementors why definitions exists. It is not required to be in the spec as a schema author could just as well use their own custom keyword to structure schemas. However because the spec encourages the use of definitions implementors forget that $ref is much more powerful than it appears to be because all the schemas in the wild are nicely structured. It is only when they see the edge cases in the test suite they get a heart attack and are wondering how they are ever going to write a parser that can handle those cases.
No more shadowing base URIs, just like @handrews proposal.

Lastly I don't think any of these proposals will simplify $id to a degree that we will no longer have people on slack asking questions about how it works. Basically everytime someone complains about $id they all have their own ideas about how it should work (including me 😆 ). No one proposal will change that. Nonetheless I think it's a good discussion to have in the future. We should just shove draft-08 out of the door as @handrews mentioned and look at if we want to change $id for draft-09.

handrews commented 5 years ago

@johandorland I've heard the "just definitions/$defs" proposal before (I don't recall if that was from your or not), but I've always been skeptical of that for a couple of reasons. One problem that it shares with my proposal here (as @jdesrosiers noted) is that not being able to convert $ref to an embedded document is rather problematic. (Honestly, that might be the best argument so far against my proposal).

I think I'm going to close this and re-open one just to forbid base URI shadowing. I don't even know why we thought that was necessary, TBH. I suspect we had reason to believe someone might be using it, and just wanted to clarify it in examples. We did not add that feature, we were just trying to clarify what we thought was already there based on prior unclear wording at least back to draft-04.

@jdesrosiers if you would like to re-file your whole proposal (basically dump https://github.com/json-schema-org/json-schema-spec/issues/719#issuecomment-468150221 into a new issue), I would highly encourage that.

I may or may not re-file the no-fragment+$anchor option I was proposing., but I need to think more on the implications of not being able to replace a $ref (either directly or after pushing it into an allOf branch without any other adjacent properties) with an embedded document. While I disagree with your rationale for how $id with a fragment would work with your model of embedded documents, there is a definite problem there.

jdesrosiers commented 5 years ago

@handrews

I thought, claiming compatibility with the current $id in terms of the fragment syntax.

Ahh, I see the confusion now. I was claiming backwards compatibility. You can do everything you used to be able to do and more.

@epoberezkin

Once we understand the problem, we can see whether it can be solved with the existing vocabulary or any extension is needed.

The biggest problem is that $id is complicated and inefficient (requires a second pass) to implement. A secondary problem is that $ref can't always be inlined with $id. My proposal solves both of these problems. Implementation is only a few lines of code (I have a POC that proves it) and $ref <=> $id. I wouldn't have brought it up if I hadn't validated it first.

I've been using the words "my proposal", but I never intended to propose anything. My intention was to share what I was doing and let you all decide which bits (if any) you want to incorporate into JSON Schema. All of my comments in this issue have been in the spirit of clarifying what my implementation does.

I'll create a new issue describing the model my implementation uses and I'll refrain from calling it a "proposal" :wink:. @handrews almost entirely understands what I'm doing. There are a few things we disagree on, but theres is also still something I'm not communicating well enough. Everyone who thinks this model is more complicated than what we have now, you're missing something. I encourage you to follow the issue I will be creating shortly. I'm going to take another stab at explaining it better.

epoberezkin commented 5 years ago

@jdesrosiers thank you

The biggest problem is that $id is complicated and inefficient (requires a second pass) to implement.

That is true, there are various solutions to that.

A secondary problem is that $ref can't always be inlined with $id.

You will not solve this by changing $id - $ref cannot be inlined when schemas are recursive

handrews commented 5 years ago

oh and apparently some tool (not the one in the repo I’m linking- something else mentioned there) generates things like "$id": "#/properties/foo" for everything. Which at least is consistent with the actual position, but FFS.

https://github.com/qri-io/jsonschema/issues/38

why do people want fragments in $id again?

(ノ°Д°）ノ︵ ┻━┻

epoberezkin commented 5 years ago

Err... Because they can? The most common use case I’ve seen in many schemas is to insert “$id”: “#name” in a definition to then use “$ref”: “#name” instead of “#/definitions/name”.

The case you’ve shown is probably to simplify visual navigation in large schemas so you can see where you are, which is problematic otherwise. But the tools could simply put it in $comment” or any other custom keyword - it doesn’t change any addressing, so doesn’t have to be $id.

awwright commented 5 years ago

Without reading [m]any of the comments, I think this is quite sensible at first glance.

Ideally, I would have a property like "$self" that names the schema (and sets the base), and then "$id" would be a plain name referenced from the fragment. However, "$anchor" is a suitable alternative (or "$name"). This would much more closely match people's experience with HTML.

Having $id and $anchor next to each other is not wrong, but it's also not very useful.

It's like having <html id="document">...</html>. There's not many reasons to do that, but it's difficult to become confused about what's going on.

My current problem with $id is that it is hard to implement as it basically requires a two pass parser or the complicated lazy loading of $refs.

Any case where you get to attach a name to things, you have to make an index of all the named things. (Until quantum computing becomes a thing, at least.)

why do people want fragments in $id again?

iirc it's something I more-or-less invented after surveying draft-4 implementations. (There's a little more to it than that, but I'd have to dig up notes to be sure.)

json-schema-org / json-schema-spec

"$id" as an indicator of embedded documents #719

JSON Reference for JSON Schema Implementors

Documents vs Values

$ref indicates an embedded document

$id is an embedded $ref

$anchor is not an embedded document

`$ref` indicates an embedded document

`$id` is an embedded `$ref`

`$anchor` is not an embedded document