interagent / committee

A collection of Rack middleware to support JSON Schema.
MIT License
879 stars 135 forks source link

How can I use a hyperschema that references external schemas? #60

Open ronen opened 9 years ago

ronen commented 9 years ago

Hello, I started this question as a comment at brandur/json_schema#22, but it's perhaps more appropriate here.

I'd like to break my API hyperschema into a separate files for better modularity and easier management; and in fact I may want to refer to schemas that are defined externally, but I'm not certain how to do that with Committee:Middleware::RequestValidation.

For example, the hyperschema might include something like:

"links": [
    {
      "description": "Create a widget",
      "href": "/widgets",
      "method": "POST",
      "rel": "create",
      "title": "Create Widget",
      "schema": { "$ref": "file://schemas/widget-create-request.schema.json" },
      "targetSchema": { "$ref": "file://schemas/widget-create-response.schema.json" }
    },
    ...
  ]

The request and response schemas might themselves refer to a URI for an externally-defined widget schema, e.g. something like:

{
    "type": "object",
    "properties": {
         "metadata": { "type": "object" }
         "widget": { "$ref": "http://widget-host.com/schemas/widget.schema.json" }
    },
    "required": ["widget"]
}

In the discussion at brandur/json_schema#22 it seems that one must set things up to use "id" rather than $ref, and must pre-load the external schemas into a JsonSchema::DocumentStore which is then used to expand references on a JsonSchema object.

I can probably figure out the use of "id" in my schemas, but how can I use JsonSchema::DocumentStore when configuring the middleware -- there's no documented parameter for it at:

 use Committee::Middleware::RequestValidation, schema: JSON.parse(File.read(...))

Thanks for any advice!

PS. Lacking anything else, I'll probably do a preprocessing step on my own to load the schema and recursively expand $ref's, before passing it to use Committee::Middleware::RequestValidation. Not the most elegant thing in the world, but should work.

brandur commented 9 years ago

Hey @ronin, sorry about the delay in response!

I see were you're coming from here, and unfortunately I think I may have made in error here in the interface ... What do you think about an alternative option to the Committee middleware that would take a JsonSchema::Schema object instead of JSON data.

Then you could probably do something along the lines of this:

store = JsonSchema::DocumentStore.new
Dir["./schema/**/*.json"].each do |file|
  schema_data = JSON.parse(File.read(file))
  schema = JsonSchema.parse!(schema_data)
  store.add_schema(schema)
end

schema_data = JSON.parse(File.read("./schema/root.json"))
schema = = JsonSchema.parse!(schema_data)
schema.expand_references!(store: store)

use Committee::Middleware::RequestValidation, schema: schema
ronen commented 9 years ago

@brandur OK, my turn for a slow response.

Yeah, an object would be better to make it possible to configure it before passing it in. But (and now I think the discussion actually belongs back in brandur/json_schema#22) it's still not clear to me that it can handle references to schemas at some arbitrary external URI (rather than a file in a known local schema directory).

So in any case I'll need to stick to preprocessing to find and expand $ref's manually. In which case the current API is fine, since I have the resulting raw JSON available.

And, I suppose, if json_schema were able to handle references on the fly rather than requiring a DocumentStore to be configured in advance, then the current API would remain fine.

brandur commented 9 years ago

Yeah, this is a tricky one @ronen, and the DocumentStore is partly designed as a compromise of sorts in this case.

The basic problem is that you could go in and start expanding URIs arbitrarily to maximize the flexibility here, but that leaves the consuming developer with a JSON Schema API which is extremely unpredictable in its performance characteristics in that a single validation call could potentially branch out into dozens of HTTP calls. Another likely problem is that many JSON Schema URIs are likely to be symbolic in nature, meaning that they represent a unique reference for a schema, but at the same time are not actually available at that location (this was certainly the case for XML URIs back in the day anyway).

Good to hear that the current API is at least somewhat workable for you though. Let me know if you have any ideas on better ways to handle this.

scttnlsn commented 9 years ago

I'm using a custom hyper-schema that includes the following:

"allOf": [
    {
      "$ref": "http://json-schema.org/draft-04/hyper-schema#"
    }
  ]

But have problems resolving the remote schema using the JsonSchema::DocumentStore approach described above. Any suggestions?

ronen commented 9 years ago

@scttnlsn FWIW I've bypassed the JsonSchema::DocumentStore by preprocessing my schemas to expand $refs before invoking Committee. Here's the relevant method:

    def _expand_refs!(json)
      json.tap {
        JSON.recurse_proc json do |item|
          if Hash === item and uri = item['$ref']
            uri = Addressable::URI.parse(uri)
            if uri.scheme
              source = uri
              source = @root.join uri.path.sub(%r{^/}, '') if uri.scheme == 'file'
              item.delete '$ref'
              item.merge! _expand_refs! JSON.parse source.read
            end
          end
        end
      }
    end

I haven't subjected it to heavy testing, and there are doubtless drawbacks to this approach, but it's been working fine for the cases I've tried.

scttnlsn commented 8 years ago

Thanks @ronen. Gave that a shot but then I get the following runtime errors:

#: Couldn't resolve pointer "#/definitions/positiveInteger". #: Couldn't resolve pointer "#/definitions/positiveInteger". #: Couldn't resolve pointer "#/definitions/positiveIntegerDefault0". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/positiveInteger". #: Couldn't resolve pointer "#/definitions/positiveIntegerDefault0". #: Couldn't resolve pointer "#/definitions/positiveInteger". #: Couldn't resolve pointer "#/definitions/positiveIntegerDefault0". #: Couldn't resolve pointer "#/definitions/stringArray". #: Couldn't resolve pointer "#/definitions/stringArray". #: Couldn't resolve pointer "#/definitions/simpleTypes". #: Couldn't resolve pointer "#/definitions/simpleTypes". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/schemaArray". #: Couldn't resolve pointer "#/definitions/linkDescription". #: Couldn't resolve pointer "#/definitions/map/notes". #: Couldn't resolve pointer "#/definitions/map/notes". #: Couldn't resolve references: #/definitions/linkDescription, #/definitions/map/notes, #/definitions/map/notes, #/definitions/positiveInteger, #/definitions/positiveInteger, #/definitions/positiveInteger, #/definitions/positiveInteger, #/definitions/positiveIntegerDefault0, #/definitions/positiveIntegerDefault0, #/definitions/positiveIntegerDefault0, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/schemaArray, #/definitions/simpleTypes, #/definitions/simpleTypes, #/definitions/stringArray, #/definitions/stringArray.
brandur commented 8 years ago

@scttnlsn Hey Scott! I think that the built-in json_schema validator does something very similar pretty close to what you're trying to do in that it will validate a schema and bundles in the default JSON schema and hyper-schema meta-schemas to make things a little bit easier.

Here's the critical section of code where this happens:

https://github.com/brandur/json_schema/blob/e1b592521d7513045e58c6032ca53315df9467fa/lib/commands/validate_schema.rb#L63,L72

The code is a little verbose because it tries to do some exhaustive error handling, but you might be able to repurpose it for your use.

Otherwise, if you can assemble a repro, we may be able to take a look and possibly identify a problem.

scttnlsn commented 8 years ago

Thanks @brandur! Here's what ended up working for me:

class SchemaParser
  attr_reader :store

  def initialize
    @store = JsonSchema::DocumentStore.new
  end

  def parse!(path)
    schema = JsonSchema.parse!(JSON.parse(File.read(path)))
    schema.expand_references!(store: store)
    store.add_schema(schema)
    schema
  end
end

parser = SchemaParser.new
parser.parse!('default-schema.json')
parser.parse!('default-hyper-schema.json')
parser.parse!('meta.json')
SCHEMA = parser.parse!('schema.json')

The only problem I encountered was that the committee middleware expects a Hash instead of a JsonSchema::Schema. I had to change Committee::Middleware::Base#initialize to include the following conditional:

if data.is_a?(Hash)
  @schema = JsonSchema.parse!(data)
  @schema.expand_references!
else
  @schema = data
end

Happy to put together a pull request if this is an approach you're interested in supporting.

Thanks!

brandur commented 8 years ago

Happy to put together a pull request if this is an approach you're interested in supporting.

@scttnlsn Wow, yeah, what's in there now seems pretty lacking. +1 to sending that patch back to mainline.