json-schema-org / JSON-Schema-Test-Suite

A language agnostic test suite for the JSON Schema specifications
MIT License
625 stars 209 forks source link

Tests for bundling #653

Open jdesrosiers opened 1 year ago

jdesrosiers commented 1 year ago

A while ago I released an implementation of the official bundling process and included a set of tests in JSON that could be adopted here if there is interest.

Here's an example:

{
  "description": "One external reference",
  "schema": {
    "$id": "https://bundler.hyperjump.io/main",
    "$schema": "https://json-schema.org/draft/2020-12/schema",

    "type": "object",
    "properties": {
      "foo": { "$ref": "/string" }
    }
  },
  "externalSchemas": [
    {
      "$id": "https://bundler.hyperjump.io/string",
      "$schema": "https://json-schema.org/draft/2020-12/schema",

      "type": "string"
    }
  ],
  "bundledSchema": {
    "$id": "https://bundler.hyperjump.io/main",
    "$schema": "https://json-schema.org/draft/2020-12/schema",

    "type": "object",
    "properties": {
      "foo": { "$ref": "/string" }
    },

    "$defs": {
      "https://bundler.hyperjump.io/string": {
        "$id": "https://bundler.hyperjump.io/string",

        "type": "string"
      }
    }
  }
}

Notes/Comments/Rationale:

Relequestual commented 1 year ago

In theory, I don't think the bundling section actaully adds any requirements, but lays them out clearly in relation to processing requirements and the handling of "schema resources".

Would any such tests be testing something not already covered by these? https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/id.json

jdesrosiers commented 1 year ago

@Relequestual Have a closer look at that example I posted. It's not testing the validation behavior of a bundled schema. It's testing the bundling process itself. Given a "schema" and a set of available "externalSchemas", the "bundledSchema" is the expected result of the bundling process.

In theory, I don't think the bundling section actaully adds any requirements, but lays them out clearly in relation to processing requirements and the handling of "schema resources".

100% agree.

Would any such tests be testing something not already covered by these? https://github.com/json-schema-org/JSON-Schema-Test-Suite/blob/main/tests/draft2020-12/id.json

No. The tests I proposed adding wouldn't cover validation at all, just the bundling process. Whether an implementation can validate instances using the bundled schema is a separate concern covered by the normal test suite that you link to.

Relequestual commented 1 year ago

Have a closer look at that example I posted. It's not testing the validation behavior of a bundled schema. It's testing the bundling process itself.

Yeah... I don't know how I missed that. Looks like I hardly read it. (I blame sleep deprevation.)

The tests I proposed adding wouldn't cover validation at all, just the bundling process. Whether an implementation can validate instances using the bundled schema is a separate concern covered by the normal test suite that you link to.

OK. Great. Worth remembering that the keys in $defs can be anything unique and don't have to be the schema resource's $id. Might already have that in mind, IDK.

gregsdennis commented 1 year ago

FWIW, mine results in this:


{
  "$id": "https://bundler.hyperjump.io/main(bundled)",
  "$defs": {
    "0708a427dd": {
      "$id": "https://bundler.hyperjump.io/main",
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "type": "object",
      "properties": {
        "foo": {
          "$ref": "/string"
        }
      }
    },
    "7082beaf59": {
      "$id": "https://bundler.hyperjump.io/string",
      "$schema": "https://json-schema.org/draft/2020-12/schema",
      "type": "string"
    }
  },
  "$ref": "https://bundler.hyperjump.io/main"
}

This creates a wholly new schema that encapsulates the two inputs.

The original post has a problem in that it maintains the original $id value, meaning that the original schema and the bundled schema have the same $id, which is forbidden since they're technically different schemas.

jdesrosiers commented 1 year ago

Worth remembering that the keys in $defs can be anything unique and don't have to be the schema resource's $id.

Yep. I have an option that let's users change the naming strategy to use UUIDs. I actually don't have tests for that because it's non-deterministic. I figured that people would probably at least have an option to support the recommended naming strategy, so they could run the tests in that mode. There are going to be lots of implementation specific modes people could invent.

FWIW, mine results in this:

Yeah, there are multiple valid ways to construct a bundle. I support two different modes plus a few options that could further effect the result. I said earlier that these tests don't validate anything, but maybe they should. The one sure way to know if a bundle is correct is if it produces the same output as its unbundled equivalent. If the tests were designed using this principle, it wouldn't matter what strategies or options an implementation chooses to use.

jdesrosiers commented 1 year ago

The original post has a problem in that it maintains the original $id value, meaning that the original schema and the bundled schema have the same $id, which is forbidden since they're technically different schemas.

I disagree. The way I see it, $ids identify a schema resource, which doesn't include embedded schemas. So whether the schema includes embedded schemas or not, it's still the same schema resource. In any case, there's no reason for someone to load all the constituent parts and the bundle at the same time, so the unbundled version would never conflict with the bundled version. So, I don't see a problem.

Relequestual commented 1 year ago

I'm actually not sure you CAN change the $id value in this situation. You may have references in the root schema resource which are relative URIs. If you're changing the root resource's $id value, those relative references may no longer work.

Maybe we need to add somethint to the specification to make this clearer.

gregsdennis commented 1 year ago

$ids identify a schema resource, which doesn't include embedded schemas. - @jdesrosiers

But a schema resource does include keys.

In your example, you've added $defs, which is changing the schema. You therefore have a different schema.

If your schema already had $defs, then you're adding to it, which is also changing the schema, and you again have a different schema.

You can't give it the same $id because that would be attempting to identify two schemas with a single $id.

Yes, you get the same validation outcome, but two schemas with different keywords or different keyword contents are different schemas. { "type": "integer" } is a different schema from { "type": "integer", "$defs": { "foo": { "$id": "foo-obj" } }. They're semantically equivalent, but they are not the same thing.

In any case, there's no reason for someone to load all the constituent parts and the bundle at the same time, so the unbundled version would never conflict with the bundled version.

You can't make assumption about what our users will do.

If I want to bundle all of the 2020-12 meta-schemas into a single schema, I can't register that under https://json-schema.org/draft/2020-12/schema. It would overwrite (or fail to overwrite) the actual meta-schema. Anyone who then wants to get the meta-schema would get the bundled one instead, which is incorrect.

The bundled schema needs a new $id.

I'm actually not sure you CAN change the $id value in this situation. You may have references in the root schema resource which are relative URIs. - @Relequestual

Look at my example. All of the $ids and $refs stay the same, and all of the relative references still work.

The only way this might break is if you had references that were just pointer fragments, and the implementation didn't handle changing the base URI right.

For example:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/main",
  "type": "object",
  "properties": {
    "foo": {
      "$ref": "/string"
    },
    "nested": { "$ref": "#" }
  }
}

For /properties/nested, if the implementation isn't changing the base URI when it bundles this schema (the way I do it), then the # may resolve to the bundled root. This would affect any bundled schema.

But the problem is a bug in how the implementation manages the base URI while evaluating, not how bundling works.

jdesrosiers commented 1 year ago

In your example, you've added $defs, which is changing the schema. You therefore have a different schema. [...] They're semantically equivalent, but they are not the same thing.

That's a fair point. I agree. They are equivalent, but not entirely the same.

If I want to bundle all of the 2020-12 meta-schemas into a single schema, I can't register that under https://json-schema.org/draft/2020-12/schema. It would overwrite (or fail to overwrite) the actual meta-schema. Anyone who then wants to get the meta-schema would get the bundled one instead, which is incorrect.

That sounds like the desired behavior to me. If those schemas already exist, loading the bundle should either fail or overwrite the existing schemas depending on how the implementation handles duplicate identifiers.

gregsdennis commented 1 year ago

That sounds like the desired behavior to me.

If I ask for X, I should get X. This is expected behavior.

Getting X-with-all-of-its-references is unexpected and surprising behavior.

Relequestual commented 1 year ago

...

The bundled schema needs a new $id.

I'm actually not sure you CAN change the $id value in this situation. You may have references in the root schema resource which are relative URIs. - @Relequestual

Look at my example. All of the $ids and $refs stay the same, and all of the relative references still work.

I can see that. I hadn't conisdered bundling the previous root entity too to maintain the given $id identifier.

The only way this might break is if you had references that were just pointer fragments, and the implementation didn't handle changing the base URI right.

For example: ... But the problem is a bug in how the implementation manages the base URI while evaluating, not how bundling works.

Agreed. But this isn't what I meant when I said "You may have references in the root schema resource which are relative URIs." It may have been helpful if I had provided an example, but I didn't consider you bundling the primary resource too. I don't think this is an unreasonable assumption...

The bundling process for creating a Compound Schema Document is defined as taking references (such as "$ref") to an external Schema Resource and embedding the referenced Schema Resources within the referring document. Bundling SHOULD be done in such a way that all URIs (used for referencing) in the base document and any referenced/embedded documents do not require altering. - https://json-schema.org/draft/2020-12/json-schema-core#section-9.3.1-1

The bundling process, you bundle the referenced resources, not the primary one too. (Although, I actually think your solution is quite smart and possibly even preferable.)

Consider, if I were to take your example and modify it slightly...

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "https://example.com/main",
  "type": "object",
  "properties": {
    "foo": {
      "$ref": "/string"
    },
    "nested": { "$ref": "https://example.com/main" }
  }
}

Bundling without also embedding the root resource would break the internal reference. Although you could argue that the URI was already registered by the unbundled version. (Although an implementation which only supports referencing URIs defined in the same document, might not support such a registry.)

As I said, I think your solution @gregsdennis to also bundling the root resource is smart and possibly preferable, however I think it does fall foul of the following...

A Bundled Schema Resource MUST NOT be bundled by replacing the schema object from which it was referenced, or by wrapping the Schema Resource in other applicator keywords. - https://json-schema.org/draft/2020-12/json-schema-core#section-9.3.1-5

We COULD change that, and even promote your approach to bundling, but I think that would be a spec change. I am leaning towards being in favour of this.

However, I make note of the following, and some considerations...

In order to produce identical output, references in the containing schema document to the previously external Schema Resources MUST NOT be changed, and now resolve to a schema using the "$id" of an embedded Schema Resource. Such identical output includes validation evaluation and URIs or paths used in resulting annotations or errors. - https://json-schema.org/draft/2020-12/json-schema-core#section-9.3.1-6

The intent, at least from me, was to be able to use the bundled and unbundled version of a schema interchangibly. Maybe only in memory, maybe as some sort of compile time assistance. Reason doesn't matter. A user or implementation should not care or even know any difference between the two.

When processing a bundle (Compound Schema Document), you have to (I think) start with a fresh schema registry. When initially processing and indexing a schema and its references, you'll come across the other schema resources which will later be bundled. When later processing a bundled version of the schema, now with embedded schema resources, you'll encounter the same URIs. They are of course identical. Do you validate that they are identical, or else raise an error? Maybe you do, which negates this point.


Re...

That sounds like the desired behavior to me.

If I ask for X, I should get X. This is expected behavior. Getting X-with-all-of-its-references is unexpected and surprising behavior.

I refer back to my earlier mentioned intent, that users should be unaware of a bundled vs non-bundled version of any given schema resource. It should be transparant. (Much like URI normalization or encoding... multiple physical strings resolve to logically the same thing.)

I think this was one of the reasons for choosing a specific and distinct name for source a document.

To be clear, by expressing intent, I'm not trying to lock down discussion or direction of discussion. I think we've potentially found an intersting and different than intended approach which might be preferable. (Assuming we aren't hung up on the "identical output" clause: https://json-schema.org/draft/2020-12/json-schema-core#section-9.3.1-6 - I am not.)

jdesrosiers commented 1 year ago

The intent, at least from me, was to be able to use the bundled and unbundled version of a schema interchangibly.

That was my intent as well. I think it's likely to have a development environment where the schemas are unbundled for ease of development and debugging, but bundled in production. I think guaranteeing that the behavior is identical in both environments is a property people will appreciate.

Greg's approach can still be used and produce identical output, but it depends on how you use that bundle. If you validate an instance directly against the bundled schema, the evaluation path will be different because of the added $ref to point to the main schema. But, if you just use the bundle only as a way of loading multiple schemas at once, you can validate directly against the main schema. Here's what it would look like in my implementation.

// Loads schemas /main(bundled), /main, and /string
addSchema({
  "$id": "https://bundler.hyperjump.io/main(bundled)",
  ...
});

// Produces identical output
const output = validate("https://bundler.hyperjump.io/main", { "foo", 42 });

// Produces different output
const output = validate("https://bundler.hyperjump.io/main(bundled)", { "foo", 42 });

The first will produce identical output to if you loaded the schemas individually, while the second would differ in evaluation path.

Relequestual commented 1 year ago

Given @gregsdennis interpretation is reasonable, and the solution (as in, Greg's approach to bundling schemas) is reasonable and conformant to the spec, let's make the tests follow this approach?

Sound like a way forward @jdesrosiers?

@gregsdennis @jdesrosiers Do either of you feel we need a spec issue to better clarify if using the same URI is allowed / not allowed? Regardless of the intent, both are reasonable interpirtiatons, and the intended approach may cause issues/problems for compliant validators.

jdesrosiers commented 1 year ago

Given @gregsdennis interpretation is reasonable, and the solution (as in, Greg's approach to bundling schemas) is reasonable and conformant to the spec, let's make the tests follow this approach?

No, that doesn't address the problem. The problem is that the tests are specific to one approach to bundling when there are many valid approaches. Changing the tests to be specific to Greg's implementation is just as much a problem as the tests being specific to my implementation. We need a solution that isn't implementation specific.

I take it you missed my proposed solution to that problem ...

Yeah, there are multiple valid ways to construct a bundle. I support two different modes plus a few options that could further effect the result. I said earlier that these tests don't validate anything, but maybe they should. The one sure way to know if a bundle is correct is if it produces the same output as its unbundled equivalent. If the tests were designed using this principle, it wouldn't matter what strategies or options an implementation chooses to use.

I've already put some work into refactoring my tests to use this approach. It's not perfect, but it works and allows me to test things I couldn't really test with my original approach like the option to generate UUIDs for definition names. I can run the same tests against each combination of configuration options and Greg could use the same tests even tho he uses a different bundling strategy and a different output format (I haven't updated to the new format proposed for the next release yet).

Here's an example.

{
  "description": "External reference to a different dialect",
  "schema": {
    "$id": "https://bundler.hyperjump.io/main",
    "$schema": "https://json-schema.org/draft/2020-12/schema",

    "type": "object",
    "properties": {
      "foo": { "$ref": "/tuple" }
    }
  },
  "externalSchemas": {
    "https://bundler.hyperjump.io/tuple": {
      "$id": "https://bundler.hyperjump.io/tuple",
      "$schema": "https://json-schema.org/draft/2019-09/schema",

      "type": "array",
      "items": [{ "type": "string" }]
    }
  },
  "tests": [
    {
      "description": "valid",
      "instance": {
        "foo": ["is a string"]
      }
    },
    {
      "description": "invalid",
      "instance": {
        "foo": [42]
      }
    }
  ]
}

The test runner would load all schemas individually, validate against one of the tests, and store the output as the expected result. Then it would bundle the schema, validate one of the tests against the bundle, and store the output as the actual result. Then compare the expected and actual outputs to determine if the bundle is correct.

Note: This example has an example of a passing and a failing test, but the test runner doesn't specifically care if the result is passing or failing, it only cares if the bundled and unbundled results are the same.

Relequestual commented 1 year ago

We need a solution that isn't implementation specific.

I agree. I thought Greg's solution was implementation idependant. In that, we just needed to confirm that implementations were able to process the bundled schema and get a correct validation result (and not an error). Primerily, to check the correct identification and capturing of ids of bundled resources which are then referenced.

Although, I think I also didn't think about the fact that these will be a new class of tests as opposed to tests that can be run with the current set (assuming I'm understanding this correctly.) That represents some additional work (and therefore questions to answer). Such as, how will we identify different sets of tests? And each set of tests will need documentation as to how they should be run. (Not a problem, just a note.)

I take it you missed my proposed https://github.com/json-schema-org/JSON-Schema-Test-Suite/issues/653#issuecomment-1776257786 to that problem ...

I had indeed missed it. My apologies.

While I understand the test you've shown an example of, the problem with what we perceived to be OK as a test, was that Greg's implementation wouldn't support it.

I think we need to make sure we are clear what we intend to be testing here. Are we testing that the bundling is done correctly, or that a bundle can be correctly procssed? There's no reason we can't do both (and I think we should), but they are different questions that might require different approaches.

gregsdennis commented 1 year ago

I'm starting to reconsider whether bundling actually needs to be part of the core spec. Having different (especially incompatibly different) approaches to bundling isn't great.


The primary contradiction here is that we want to be able to redirect $id usage from the unbundled version to the bundled version (Jason's point about drop-in replacement), but we also want recognize that bundling inherently creates a new schema (my point about a bundle having a different $id).

I'm not sure how to reconcile that contradiction.

Relequestual commented 1 year ago

THe intent was for an implementation to be able to, at will, choose to bundle (maybe using third party tools) a schema for processing, transparantly to the end user. Mostly because this is what tools already try to do and get it wrong.

I think I've already said, and I think this is where we disagree: I feel it's fine for an implementation to overrided a bundled version vs a non-bundled version of a resource, or vice versa, at will. The end user shouldn't notice any difference. The impelementation is what has the limitation (or preference). I think of bundling/un-bundling like a processing step, much like you would encode/decode something for a local represenation based on local constraints/requirements.

I suggest we could change the spec to require @gregsdennis's approach to bundling, which would remove this potential challenge in interpritaiton. But, I still do consider this a situation where overriding a registry at runtime is desierable.

Still, this is about the tests, for right now. The simplest thing I think we can do for now is define that the tests should run each schema in isolation, in that they use a fresh registry, so the $id can't be replaced/overriden anyway.

jdesrosiers commented 1 year ago

I think I also didn't think about the fact that these will be a new class of tests as opposed to tests that can be run with the current set (assuming I'm understanding this correctly.)

Yes, you're understanding this correctly. This will need to be a distinct test suite designed specifically for testing bundlers. This wouldn't be unprecedented. There's currently a small output suite designed specifically for testing output.

While I understand the test you've shown an example of, the problem with what we perceived to be OK as a test, was that Greg's implementation wouldn't support it.

I explained here how Greg's bundles could be used in a way that produces identical output and would therefore be able to use the tests. If not used that way, the bundles aren't strictly spec compliant because they don't produce identical output and I'd expect them to fail the test suite.

Are we testing [...] that a bundle can be correctly procssed?

Sort of, but not exactly. It's not enough to test that the bundle can be processed correctly. We're not interested in whether the validation result is true when we expect it to be true. The point is to test that the bundle vs individual schemas are processed the same way. If one fails the other should fail. We don't care what the result is as long as the results are the same. We can do that by comparing output.

Are we testing that the bundling is done correctly[...]?

That's what I originally did for my implementation and shared here, but we found that that's not a great idea. There are multiple reasonable and spec compliant ways to bundle, so we can't have tests that ensure that something is bundled correctly, because there is no one "correct" way (and there shouldn't be). We bumped into that problem immediately when Greg tried to use my tests for his implementation and found that they weren't helpful because he took a different yet perfectly valid approach.

I'm starting to reconsider whether bundling actually needs to be part of the core spec. Having different (especially incompatibly different) approaches to bundling isn't great.

Personally, I think the spec should at most define behaviors that any bundle must have. Any specific approach to bundling is blog post material, IMO. It's probably sufficient to define that using a bundle should behave identically to its unbundled equivalent.

I suggest we could change the spec to require @gregsdennis's approach to bundling

I don't think the spec should be requiring any approach. Only outcomes are important. That's all the spec should require.

The simplest thing I think we can do for now is define that the tests should run each schema in isolation, in that they use a fresh registry, so the $id can't be replaced/overriden anyway.

A fresh registry is essential for the approach I'm proposing anyway. You need a clean registry both when getting the bundled output and the unbundled output to ensure that one isn't resolving some schema added by the other and tainting the results.

gregsdennis commented 1 year ago

they don't produce identical output

This is true, but then that's not a requirement of the spec.

The spec does say this, though (running out of markdown formats):

A Bundled Schema Resource MUST NOT be bundled by replacing the schema object from which it was referenced, or by wrapping the Schema Resource in other applicator keywords.

To me, this explicitly states that the bundled schema cannot replace the original in the registry. Which leads me to:

Only outcomes are important. That's all the spec should require.

I agree with this. However at the same time, I feel the spec is contradictory in that it expects a bundle to act as an in-place replacement for the original schema while also claiming that an $id cannot reference two schemas.


Maybe the problem is that we're trying to validate using a bundle directly. Maybe that's not a thing we should say we support.

@jdesrosiers you've historically expressed an opinion that bundling be a transportation mechanism and that we should just use file archives (e.g. JAR or ZIP). Maybe that's what "bundling" needs to be: transportation. Then it could also work as mass registration.

If we say that registering a bundle means unbundling it and registering its component schema resources, then everything just works. All of the references to the original $id can stay the same, all of the output is the same as validating against the original, the validation result is the same as the original...

In this way, a bundle doesn't need to be a schema; it's just a convenience for registration.

@Relequestual I'm curious about the history of bundling on this. Is there a reason "bundling into a schema" was the direction this feature took?

jdesrosiers commented 1 year ago

This is true, but then that's not a requirement of the spec.

It is (and was intended to be) a requirement. See, https://json-schema.org/draft/2020-12/json-schema-core#section-9.3.1-6

A Bundled Schema Resource MUST NOT be bundled by replacing the schema object from which it was referenced, or by wrapping the Schema Resource in other applicator keywords.

To me, this explicitly states that the bundled schema cannot replace the original in the registry.

That's not what that was meant to express. Here's an example of the kind of thing that was talking about.

{
  "$id": "https://example.com/schema",
  "type": "object",
  "properties": {
    "foo": {
      "$ref": "/common#/$defs/unsigned-number",
      "description": "Foo"
    }
  }
}
{
  "$id": "https://example.com/common",
  "$defs": {
    "number": { "type": "number" },
    "unsigned-number": {
      "$ref": "#/$defs/number",
      "minimum": 0
    }
  }
}

Here's what naively replacing the schema object from which it was referenced might look like.

{
  "$id": "https://example.com/schema",
  "type": "object",
  "properties": {
    "foo": {
      "allOf": [
        {
          "$ref": "#/$defs/number",
          "minimum": 0
        },
        "description": "Foo"
      ]
    }
  }
}

Obviously, that doesn't work because the reference that was pulled in is no longer in scope. This is also an example of wrapping the schema in an applicator, which is not allowed because it changes the output.

Maybe the problem is that we're trying to validate using a bundle directly. Maybe that's not a thing we should say we support. [...] Maybe that's what "bundling" needs to be: transportation. Then it could also work as mass registration.

That's the way I think of your approach and I think it's a great solution except I'd drop the top level $ref and make it only be usable for mass registration because using the $ref could change the output. I think updating the spec to emphasize bundling as a mass-registration solution rather than a single-schema solution is a good idea. However, that shouldn't mean that a schema couldn't be bundled in such a way that it could also be used as a drop-in replacement for a schema. That would just be a nice-to-have, not a requirement. The requirement is just that it loads all the schemas in a way that is identical to if they were loaded individually.

gregsdennis commented 1 year ago

However, that shouldn't mean that a schema couldn't be bundled in such a way that it could also be used as a drop-in replacement for a schema.

If we can find a way to do this such that it doesn't actually replace the original schema, I'd be for it. The current suggestion, though, is to use a clean registry, which means that the user has to do something (start with a clean registry), and there isn't really anything the implementation can do about it (unless it knows that it's getting a bundle, which it shouldn't).

jdesrosiers commented 1 year ago

I agree with Ben in that I don't think its a problem for the bundle to have the same identifier as the main schema. There's no reason other than the testing strategy I proposed that you would want to load both the bundle and the individual schemas at the same time. Why would you need to load a schema or set of schemas that's already been loaded? But even it happens, the spec allows for a schema to replace a previously registered schema. The spec also allows for a schema to be rejected if it has an identifier of an already registered schema. I think that's a perfectly fine choice too. For those implementations, if users want to reload those schemas using the bundle, they would need to remove the already loaded schemas before reloading. It's a little more work, but arguably more secure.

Relequestual commented 1 year ago

A Bundled Schema Resource MUST NOT be bundled by replacing the schema object from which it was referenced, or by wrapping the Schema Resource in other applicator keywords.

To me, this explicitly states that the bundled schema cannot replace the original in the registry.

Creation of the bundle is copied the referenced schemas into the "primary" schema resource, so when the spec says "A Bundled Schema Resource", it is only talking about those schema resources which have been bundled/copied, and not the primary schema resource.

The discussion above about better defining a what a bundle is and is not, is useful. We do have to keep in mind the primary reason this work was done in the first place: Provide a standard approach for removing external references in a non-breaking way for tooling that doesn't support external references. (Lowering the barrier to support schemas with SOME sort of references.)

Existing tooling looks to remove ALL references, and we have been clear, that's not viable in all cases.

I think the argument for bundled just being a mass registration container is strong. I would want to do some analysis to find out what percentage of "current" (2019-09 and above) implementations support a registry. I would hope it is most.

All that being said, in terms of the subject of this Issue, it looks like we might be driving closer to consensus?

gregsdennis commented 1 year ago

I would want to do some analysis to find out what percentage of "current" (2019-09 and above) implementations support a registry.

@Julian is finishing up that work currently.

jdesrosiers commented 1 year ago

We do have to keep in mind the primary reason this work was done in the first place: Provide a standard approach for removing external references in a non-breaking way for tooling that doesn't support external references.

That's a really good point. The mass-registration approach would load schemas that have external references. If the bundle is designed to be only a mass schema loading mechanism, implementations that don't support external references wouldn't be able to use it. They wouldn't even have a concept of schema registration.

I would want to do some analysis to find out what percentage of "current" (2019-09 and above) implementations support a registry.

My impression is that nearly all validation implementations support external references and registries. The places were we see lack of support for these things is in other types of implementations like documentation/form/code generators. (Especially in the OpenAPI ecosystem for some reason.) So, the types of implementations most likely to have issues with the mass-registration approach are also the ones we have the least visibility on. We only have good visibility on validators right now.

However, it's also my impression that implementations that don't support external references also don't support embedded schemas. If they don't support embedded schemas, then they can't use our bundling approach in any form and they're out of luck anyway.

Which is why I'm still a big fan of embracing the mass-registration framing. I also don't think this framing excludes bundles being built in a way that can be used both as mass-registration and as an executable schema without external references. Changing the way it's presented only enables more ways of bundling and using a bundle to be valid, which I think is a very good thing. Different kinds of bundles are going to be more appropriate for different use-cases.

it looks like we might be driving closer to consensus?

I think so. @gregsdennis, do you think the testing strategy I proposed is viable? Is there any reason it might not be usable with the bundles your implementation produces?

gregsdennis commented 1 year ago

implementations that don't support external references wouldn't be able to use it. They wouldn't even have a concept of schema registration.

Regarding "external references," I've been talking about fetching documents, not merely pointing to other documents (that may be registered) with a different base URI. I'm fairly certain that not supporting other-document references in general would result in a lot of test suite failures, which isn't something we see.

Implementations that don't support fetching will definitely have a registry.

it's also my impression that implementations that don't support external references also don't support embedded schemas

Again, a difference in what "external reference" means. I'm curious what's feeding this impression. As before, a lot of implementations would fail the test suite.

do you think the testing strategy I https://github.com/json-schema-org/JSON-Schema-Test-Suite/issues/653#issuecomment-1799652283 is viable?

That would work with my approach, yes. I'd take the target schema and a clean registry, register the external schemas, bundle, and use the result to run the test cases.

With this approach, it doesn't matter what the bundle itself looks like or what its $id is.

One concern I have is that it's easy for someone building their runner to merely register the external schemas, skip bundling, and just run the schema against the test cases. I'm not sure we can do anything about that (especially if we explicitly call out that these are bundling tests).

jdesrosiers commented 1 year ago

I meant "external references" to mean references outside of the current schema document. I believe that's the way @Relequestual was using the term in his last comment and that's what I was responding to. Whether or not an implementation supports fetching schema isn't relevant to this conversation and I didn't mean to bring that in.

it's also my impression that implementations that don't support external references also don't support embedded schemas

I'm curious what's feeding this impression.

This comes from user feedback on my bundler implementation. I've had several people try my bundler because the tool they were working with required a single schema (no external references) and then report that the bundle doesn't work because the implementation doesn't support embedded schemas or doesn't support references at all. Again, I expect implementations with these limitations are almost always (if not exclusively) non-validation implementations such as generators (docs, forms, code, etc).

One concern I have is that it's easy for someone building their runner to merely register the external schemas, skip bundling, and just run the schema against the test cases. I'm not sure we can do anything about that (especially if we explicitly call out that these are bundling tests).

Yep. Agreed that that's a risk and that there's much we can do about it. We'd just need to document as best we can and provide support when we have the opportunity.

jdesrosiers commented 12 months ago

I finished updating my bundler tests. Have a look and let me know if you're happy with the approach. If so, I'll make a PR here. There's a README there that explains how things work.

I've introduced a new concept that we could consider adopting for the main test suite as well. I'm using a compatibility field to specify what dialects the schema applies to. That way instead of making copies of the tests for each new release, we can maintain one set of tests and provide test runners with the information they need to filter out tests that don't apply to the release they're testing.

One thing I didn't completely work out was how to organize tests for proposed extensions like propertyDependencies. I put all extensions in separate file, but there's no good way for implementations to filter those to the ones they support. I was considering including a tags property. Test runners could use these tags to filter out things they don't support. This could be used for other things as well to identify related tests.