json-schema-org / JSON-Schema-Test-Suite

A language agnostic test suite for the JSON Schema specifications
MIT License
615 stars 205 forks source link

Test for jumping over a resource #719

Closed gregsdennis closed 5 months ago

gregsdennis commented 7 months ago

Inspired by an example in @jviotti's blog post on dynamic scope, I wondered how different implementations would handle a reference that targeted a resource embedded in another resource and dynamic references were involved.

Example:

{
  "$id": "https://example.com/foo",
  "type": "object",
  "properties": {
    "bar-item": { "$ref": "item" }
  },
  "$defs": {
    "bar": {
      "$id": "https://example.com/bar",
      "type": "array",
      "items": { "$ref": "item" },
      "$defs": {
        "item": {
          "$id": "item",
          "type": "object",
          "properties": {
            "content": { "$dynamicRef": "#content" }
          },
          "$defs": {
            "defaultContent": {
              "$dynamicAnchor": "content",
              "type": "integer"
            }
          }
        },
        "content": {
          "$dynamicAnchor": "content",
          "type": "string"
        }
      }
    }
  }
}

I see two possibilities here:

Visually analyzing it, I think the latter is the correct approach, meaning that

{ "bar-item": { "content": 42 } }

is valid, and

{ "bar-item": { "content": "value" } }

is invalid.

This is also how my implementation behaves. I'd like to see how others handle it.

Julian commented 7 months ago

What does the spec say is the correct behavior?

gregsdennis commented 7 months ago

I don't think the spec explicitly says, which is why I raised the issue. I think it's something we need to discuss.

gregsdennis commented 7 months ago

My best guess it's that the spec would say the dynamic scope is the set of resources that evaluation passed through, which is

notably skipping over bar.

notEthan commented 7 months ago

For the instance's /bar-item/content my implementation resolves its schema to /$defs/bar/$defs/item/$defs/defaultContent. "$ref": "item" loads /$defs/bar/$defs/item from the schema registry, /$defs/bar isn't a part of that evaluation and dynamic scope doesn't include it.

However, in a version with a pointer fragment, closer to the question that prompted this, because I do maintain dynamic scope from a $ref target's resource root through pointer evaluation, the outcome changes when pointer evaluation changes the resource root.

{
  "$id": "https://example.com/foo",
  "type": "object",
  "properties": {
    "bar-item": { "$ref": "#/$defs/bar/$defs/item" } // ⇐ change URI to pointer fragment
  },
  "$defs": { // the rest is the same as above
    "bar": {
      "$id": "https://example.com/bar",
      "type": "array",
      "items": { "$ref": "item" }, // this isn't doing anything, is it?
      "$defs": {
        "item": {
          "$id": "item",
          "type": "object",
          "properties": {
            "content": { "$dynamicRef": "#content" }
          },
          "$defs": {
            "defaultContent": {
              "$dynamicAnchor": "content",
              "type": "integer"
            }
          }
        },
        "content": {
          "$dynamicAnchor": "content",
          "type": "string"
        }
      }
    }
  }
}

Given this one, for the instance's /bar-item/content, my dynamic scope does pass through /$defs/bar and resolves its schema to /$defs/bar/$defs/content.

I'm not sure that's correct - maybe I should change this so that dynamic scope is added using the resource root of the fully resolved schema rather than the resource root indicated by the non-fragment $ref URI. Current behavior is what @jdesrosiers' comment (https://github.com/json-schema-org/website/pull/274#discussion_r1482235743) considers correct.

I notice @gregsdennis' comment (https://github.com/json-schema-org/website/pull/274#discussion_r1482068764) saying behavior with this kind of pointer traversing across resources is undefined. I see language that this is a SHOULD NOT, and commentary about the possibility of disallowing it, but seems allowed and defined at present, unless I am missing more.

https://json-schema.org/draft/2020-12/draft-bhutton-json-schema-01#section-9.2.1

Since URIs involving JSON Pointer fragments relative to the parent schema resource's URI cease to be valid when the embedded schema is moved to a separate document and referenced, applications and schemas SHOULD NOT use such URIs to identify embedded schema resources or locations within them.

An implementation MAY choose not to support addressing schema resource contents by URIs using a base other than the resource's canonical URI, plus a JSON Pointer fragment relative to that base. Therefore, schema authors SHOULD NOT rely on such URIs, as using them may reduce interoperability.

gregsdennis commented 7 months ago

Yeah, I agree 100%, and I believe (I haven't tested) that my implementation would add bar to the dynamic scope for a pointer that crosses the resource boundary. (It doesn't, see Update below.)

My issue with the pointer that crosses the boundary in the linked example is that the dynamic scope in that specific scenario is not well-defined by the spec. (Or at least, I don't think it's sufficiently defined.)


Update

I've run the variation with a pointer (that @notEthan posted) on my validator, and it still doesn't regard the intermediate schema as part of the dynamic scope. (The validation result is unchanged from the top example.)

Moreover, it still doesn't regard bar as part of the dynamic scope if you reference it directly with a pointer to item (which is what the example in the blog post does):

{ "$ref": "bar#/$defs/item" }

My implementation doesn't add to the dynamic scope until the reference is fully resolved.

@Julian, this is the ambiguity that I don't think is addressed by the spec.

gregsdennis commented 7 months ago

I notice @gregsdennis' comment (https://github.com/json-schema-org/website/pull/274#discussion_r1482068764) saying behavior with this kind of pointer traversing across resources is undefined. I see language that this is a SHOULD NOT, and commentary about the possibility of disallowing it, but seems allowed and defined at present, unless I am missing more. - @notEthan

The SHOULD NOT is directed toward schema authors, not implementors. It's there because there's no requirement on implementations to honor such pointers. The closest it has is:

An implementation MAY choose not to support addressing schema resource contents by URIs using a base other than the resource's canonical URI, plus a JSON Pointer fragment relative to that base.

which merely grants permission to support such references.

gregsdennis commented 7 months ago

@Julian (and @jdesrosiers, @jviotti), here's some text from Core 7.1 that I think says the dynamic scope should not contain bar:

The path from this root schema to any particular keyword (that includes any "$ref" and "$dynamicRef" keywords that may have been resolved) is considered the keyword's "validation path."

and

from the perspective of dynamic scope, following a reference is no different from descending into a subschema present as a value

The "valuation path" identifies the resources in the dynamic scope.

In the above example, when you get to the $dynamicRef, the evaluation path is

/properties/bar-item/$ref/properties/content

The bar resource isn't represented in any of those segments:

/properties/bar-item/$ref                         // from foo
                         /properties/content      // from item

This informs me that bar isn't part of the dynamic scope.

Relequestual commented 7 months ago

I think this is one of those situations where the spec defines what IS based on what is supported, but doesn't define outside of the scope of what's defined.

That seems kind of obvious when put that way, but the point is, undefined behaviour is undefined for multiple reasons, one of which is to avoid having to hash out these kind of edge cases (ref path jumping over resource boundaries).

Although, I do see the spec doesn't explicitly say this is undefined, just that implementations MAY support it.

In which case, the first test we should add and make mandatory, but the other one we cannot make mandatory because it's optional to support it.

jviotti commented 7 months ago

This informs me that bar isn't part of the dynamic scope.

The conclusion from @gregsdennis aligns with my understanding so far.

jdesrosiers commented 7 months ago

My implementation doesn't add to the dynamic scope until the reference is fully resolved.

That's what my implementation does too. I was mistaken when I said otherwise in the blog post PR.

notEthan commented 7 months ago

{ "$ref": "bar#/$defs/item" }: This is the only case I think is debatable. Technically, this is not a canonical identifier and you aren't guaranteed that it will work. However, we've discussed that "canonical" isn't really the right abstraction and what we really want is for pointers to no cross resource boundaries. In which case, this would be a valid URI and we should agree on how it works. I think this could go either way and I'm fine with whichever interpretation is chosen.

Is this not crossing resource boundaries, just as #/$defs/bar/$defs/item is? #/$defs/bar/$defs/item crosses from foo (root) to bar to item, each having an $id. bar#/$defs/item crosses from bar to item - one fewer, but still across a resource. Am I missing something there?

If there isn't any crossing of resource boundaries ("$ref": "item") it sounds like our implementations all agree on the outcome, though I'm not sure it's specified adequately.

gregsdennis commented 7 months ago

Is this not crossing resource boundaries, just as #/$defs/bar/$defs/item is? #/$defs/bar/$defs/item crosses from foo (root) to bar to item, each having an $id. bar#/$defs/item crosses from bar to item - one fewer, but still across a resource. Am I missing something there? - @notEthan

No, you're right, it's the same concept.

@jdesrosiers I would expect both your second and third cases to be unsupported since they both cross resource boundaries.

jdesrosiers commented 7 months ago

Crossing resource boundaries isn't about reference resolution. You're just describing an external reference. Crossing resource boundaries is about JSON Pointers. JSON Pointers identify a location in a schema resource. bar#/$defs/item identifies a location within the https://example.com/bar schema resource. The value at that location happens to be a different schema resource, but that's not the concern of the JSON Pointer. #/$defs/bar/$defs/items identifies a location in an a different schema resource than the one it started in.

The idea is that encountering an embedded a schema should be conceptually equivalent to following a reference to an external schema. We can illustrate the equivalence by deconstructing the example.

{
  "$id": "https://example.com/foo",
  "type": "object",
  "properties": {
    "bar-item": { "$ref": "item" }
  },
  "$defs": {
    "bar": { "$ref": "bar" }
  }
}
{
  "$id": "https://example.com/bar",
  "type": "array",
  "items": { "$ref": "item" },
  "$defs": {
    "item": { "$ref": "item" }
    "content": {
      "$dynamicAnchor": "content",
      "type": "string"
    }
  }
}
{
  "$id": "https://example.com/item",
  "type": "object",
  "properties": {
    "content": { "$dynamicRef": "#content" }
  },
  "$defs": {
    "defaultContent": {
      "$dynamicAnchor": "content",
      "type": "integer"
    }
  }
}

So, the spec doesn't require #/$defs/bar/$defs/items to work because doing so would be equivalent to having to follow a reference in order to resolve a JSON Pointer as would be necessary when the schema resource are separated.

Bringing it back to the dynamic scope question, bar#/$defs/items is like a reference to a reference. First, you go to bar and find the location the pointer identifies. The value at that location is effectively a reference to items, which ends up being your destination. The thing that's unclear is whether bar gets to be part of the dynamic scope because it had a role in the resolution process, or does only the final destination matter. I think it makes sense that only schema evaluation matters. There was no schema that was evaluated in bar and therefore, it's not part of the dynamic scope.

gregsdennis commented 7 months ago

... this conversation isn't about pointers crossing resource boundaries. It's about a $dynamicRef resolving to a deeply nested schema resource.

Pointers crossing boundaries has been identified as an ill-advised practice, so I'm only concerned about testing the non-ambiguous cases where this can happen, which is the case I originally opened.

In the PR, the ambiguous case is left as optional. Let's drop this, please and focus on the issue topic.

If further discussion is desired as to what constitutes crossing a resource boundary, let's discuss elsewhere, please.

jdesrosiers commented 7 months ago

Let's drop this, please and focus on the issue topic.

This definitely ended up being a tangent about what "crossing resource boundaries" means and we can resolve that discussion elsewhere, but the resolution of that discussion is relevant to this one. Specifically ...

I'm only concerned about testing the non-ambiguous cases where this can happen

Resolving what constitutes crossing a resource boundary will inform whether bar#/$defs/items is relevant to this discussion. I believe it is.

gregsdennis commented 7 months ago

Whether bar#/$defs/items crosses a resource boundary is not relevant to this discussion because this discussion is about what ends up in the dynamic scope. We've already both agreed that bar doesn't end up in the dynamic scope for this case.

Further discussion on bar#/$defs/items will not impact that outcome. Let's have this somewhere else.