leadpony / justify

Justify is a JSON validator based on JSON Schema Specification and Jakarta JSON Processing API (JSON-P).
Apache License 2.0
96 stars 18 forks source link

JsonValidationService.readSchema(InputStream is) doesn't correctly resolve http $ref #6

Open chadlankford opened 5 years ago

chadlankford commented 5 years ago

valid schema reference is built taking the value of $ref and applying it relative to the domain of the current schema. It yields a valid URL of the format http://mydomain/myschemafile.json which resolves. however I get the error message:

org.leadpony.justify.api.JsonValidatingException: [line:6,column:65] The schema reference "/myschemafile.json"(http://mydomain/myschemafile.json) cannot be resolved.

since a valid URL is built, curious why it isn't able to resolve it? could you please support this mechanism for referencing an external hosted file in $ref? fyi - I also tried putting the full url in the $ref rather than a domain relative path, same result.

thanks in advance.

chadlankford commented 5 years ago

I do realize the draft 07 docs give some latitude to the validator impl regarding resolving $ref per the following:

"Even though the value of a $ref is a URI, it is not a network locator, only an identifier. This means that the schema doesn’t need to be accessible at that URI, but it may be. It is basically up to the validator implementation how external schema URIs will be handled, but one should not assume the validator will fetch network resources indicated in $ref values."

however, if the resulting uri reference is http, it should resolve it over the network. Besides, it would be super convenient if it worked that way.

chadlankford commented 5 years ago

I ended up finding and using the mechanism in your api to implement my own JsonSchemaResolver. That works for me.

leadpony commented 5 years ago

Hello @chadlankford. Thank you for using this small library. I am happy to hear you are in the right way. Please see also Schema Resolver in Justify Examples. Thank you.

chadlankford commented 5 years ago

one other question. is there a way to output the effective schema after the $refs are resolved?

leadpony commented 5 years ago

Do you mean you would like to merge the referencing schema and the referenced schemas into a large one schema and output it to a file ? No, currently there is no such a way provided. For example, how do you obtain the effective schema from the following one without "$ref"s ?

{
    "$id": "http://example.org/example.schema.json",
    "type": "object",
    "properties": {
        "foo": {
            "$ref": "http://example.org/example.schema.json"
        }
    }
}
chadlankford commented 5 years ago

yeah, I didn't think about the recursive reference situation. I suppose you would just have to make a decision to stop resolving the references for effective schema once recursive situation is detected. maybe, inject a $comment in the effective schema to represent why a reference was not resolved.

leadpony commented 5 years ago

Here is a referencing schema.

{
    "$id": "https://example.org/a.schema.json",
    "type": "object",
    "properties": {
        "foo": {
            "$ref": "b.schema.json"
        }
    }
}

And this is the second schema referenced from the first one.

{
    "$id": "https://example.org/b.schema.json",
    "type": "integer",
    "minimum": 0
}

Then you can merge the second schema into the first one using definitions keyword as follow.

{
    "$id": "https://example.org/a.schema.json",
    "type": "object",
    "properties": {
        "foo": {
            "$ref": "b.schema.json"
    },
    "definitions": {
        "name-whatever-you-like": {
            "$id": "https://example.org/b.schema.json",
            "type": "integer",
            "minimum": 0
        }
    }
}

Practically, why do you need to merge multiple schema files into one? Your JsonSchemaResolver implementation can easily resolve the referenced schemas with "$id" of "http:" or "https:" scheme from your local file system, that is very common way when using multiple schemas. Just for debugging purpose?

chadlankford commented 5 years ago

yes, the utility of this is strictly for debugging in the cases of complex object graphs whose schemas make heavy use of $ref. Sometimes it can be helpful.

chadlankford commented 5 years ago

Another question for you...

I have implemented a custom problem handler which basically outputs the problem parameters map. The output is pretty good except it would be nice if the there was more context to the problem. For example, if it problem says a property foo is required, it would nice to see at least to which object the property is supposed to belong. Same situation if the problem is a value validation. Right now, value validation problems just indicate actual and expected value definition maps, but should indicate a property with context.

To me, the best solution is to include the entire object path to the property at the center of the problem, using either a json pointer or some sort of json pathing notation, ie, "foo.bar.key".

Like the effective schema issue I mentioned, this is also something that is very useful for deep/complex object validations.

Does that make sense?

leadpony commented 5 years ago

Thank you @chadlankford. Now I go it. Please see the issue #5 because it is related to your second request.

rconnacher commented 5 years ago

Hello @chadlankford, Would you be willing to post your implementation of JsonSchemaResolver? I'm scratching my head on how to resolve remote references.
Thanks!

leadpony commented 5 years ago

Hello @rconnacher Thank you for contacting me. Strictly not for remote schemas, but have you seen the code sample ? Another code available here also does connecting to the schemas at local web server via HTTP. Thank you.

rconnacher commented 5 years ago

Thanks, and Hi @leadpony, In my use case the schemas will be remote, so I reviewed your second example (AbstractConformanceTest)

In my experiment I define a JsonSchemaReaderFactory as in your example, and create a JsonSchemaReader using StringReader over the schema text. (I'm coding in Groovy, so I had to port your example.) Using 'http://bmeta.berkeley.edu/common/apiResponseSchemaV1.json' as the schemaUri, and

{
  "correlationId": "a56b96cd-d2f7-49e6-9094-8cfdf38850d6",
  "foo": "bar",
  "response": [
    {
      "identifiers": [
        { "type": "campus-uid", "id": "10746" },
        { "type": "calnet-id",  "id": "russellc" },
      ],
      "names": [
        { 
            "type": { "code": "PRI", "description": "Primary" },
            "familyName": "Connacher",
            "givenName": "Russell"
        }
      ],
      "phones": [
        {
            "foo": "bar"
        }
      ],
      "emails": [
        {
            "type": { "code": "BUSN", "description": "Business"  },
            "emailAddress": "russellc@berkeley.edu",
            "primary": true
        }
      ]
    }
  ]
}

as the jsonData I run this:

import javax.json.JsonReader
import org.leadpony.justify.api.JsonSchema
import org.leadpony.justify.api.JsonSchemaReader
import org.leadpony.justify.api.JsonSchemaReaderFactory
import org.leadpony.justify.api.JsonSchemaResolver
import org.leadpony.justify.api.JsonValidationService
import org.leadpony.justify.api.Problem
import org.leadpony.justify.api.ProblemHandler

new JsonResolutionTest().validate(schemaUri, jsonData)

class JsonResolutionTest  {

    private static JsonValidationService service
    private static JsonSchemaReaderFactory schemaReaderFactory

    private static void validate(String schemaUri, String jsonData) {
        try {
            service = JsonValidationService.newInstance()
            schemaReaderFactory = service.createSchemaReaderFactoryBuilder()
                    .withSchemaResolver(JsonResolutionTest::resolveSchema)
                    .build()

            def schemaData = schemaUri.toURL().text
            JsonSchemaReader schemaReader = schemaReaderFactory.createSchemaReader(new StringReader(schemaData))
            JsonSchema schema = schemaReader.read()

            List<Problem> problems = new ArrayList()
            ProblemHandler handler = problems.addAll()

            JsonReader jsonReader = service.createReader(new StringReader(jsonData), schema, handler)
            jsonReader.readValue()

            println( problems.toString() )

        } catch (Exception e) {
            println( e.getMessage() )
            e.getStackTrace().each {
                println( it.toString() )
            }
        }

    }

    private static JsonSchemaResolver resolveSchema (URI id) {
        try {
            InputStream stream = id.toURL().openStream()
            JsonSchemaReader reader = schemaReaderFactory.createSchemaReader(stream)
            return reader.read() as JsonSchemaResolver
        } catch (Exception e) {
            println(e.getMessage())
            e.getStackTrace().each {
                println(it.toString())
            }
            return null
        }
    }
}

But when I try to read the schema, I get a casting exception:

BasicSchema$None1_groovyProxy cannot be cast to org.leadpony.justify.api.JsonSchema
com.sun.proxy.$Proxy11.resolveSchema(Unknown Source)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.resolveSchema(AbstractBasicSchemaReader.java:246)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.dereferenceSchema(AbstractBasicSchemaReader.java:230)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.resolveAllReferences(AbstractBasicSchemaReader.java:214)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.postprocess(AbstractBasicSchemaReader.java:188)
org.leadpony.justify.internal.schema.io.AbstractBasicSchemaReader.readSchema(AbstractBasicSchemaReader.java:93)
org.leadpony.justify.internal.schema.io.AbstractSchemaReader.read(AbstractSchemaReader.java:49)
org.leadpony.justify.internal.schema.io.AbstractProbeSchemaReader.readSchema(AbstractProbeSchemaReader.java:48)
org.leadpony.justify.internal.schema.io.AbstractSchemaReader.read(AbstractSchemaReader.java:49)
org.leadpony.justify.api.JsonSchemaReader$read.call(Unknown Source)

That last call, "AbstractBasicSchemaReader.resolveSchema" is here:

    private JsonSchema resolveSchema(URI id) {
        JsonSchema schema = (JsonSchema)this.idSchemaMap.get(id);
        if (schema != null) {
            return schema;
        } else {
            Iterator var3 = this.resolvers.iterator();

            do {
                if (!var3.hasNext()) {
                    return null;
                }

                JsonSchemaResolver resolver = (JsonSchemaResolver)var3.next();
                schema = resolver.resolveSchema(id);     // exception is thrown here
            } while(schema == null);

            return schema;
        }
    }

I'm using Groovy 3.0.0, where method references (like ".withSchemaResolver(AbstractConformanceTest::resolveSchema") are pretty new. Is this possibly the problem? Can you suggest any way of creating a reader that can resolve remote references without that recursive sort of configuration?

Thanks very much!

chadlankford commented 5 years ago

@leadpony, @rconnacher

Below, you will find a rough outline of what I did. Basically, the NetworkJsonSchemaResolver, just retrieves the remote file, reads it in with a new SchemaReaderFactory on which its sets itself to the resolver. The getSchemaJson method is just your favorite way to pull the contents of a remote file over http as a String.

Now, I did this quickly because I knew all my references were to network locations. I am basically making the assumption every reference is a url. If I was trying to make this more robust, I would impl the NetworkJsonSchemaResolver more generically as a GenericJsonSchemaResolver which perhaps inspects the uri and uses the protocol, if any, as a hint of how to load it. For example, http or https would indicate a network location. Maybe, if the protocol is file or classpath, the resolver could handle this intelligently as a fully qualified local location. And, perhaps if all else fails, treat the reference as this library does by default.

Hope this helps.

public class SchemaLoader {
   private JsonSchemaResolver resolver = new NetworkJsonSchemaResolver();

   public JsonSchema loadSchema(String url) {
       try {
           String schemaJson = getSchemaJson(url);
           return service.createSchemaReaderFactoryBuilder()
                            .withSchemaResolver(resolver)
                            .build()
                            .createSchemaReader(
                                    new ByteArrayInputStream(schemaJson.getBytes())
                            )
                            .read();
       }
      catch(Exception e) {
         logger.error("", e);
      }
      return null;
   }   

   class NetworkJsonSchemaResolver implements JsonSchemaResolver {
        @Override
        public JsonSchema resolveSchema(URI uri) {
            try {
                String schemaJson = getSchemaJson(uri.toString());
                return service.createSchemaReaderFactoryBuilder()
                        .withSchemaResolver(resolver)
                        .build()
                        .createSchemaReader(
                                new ByteArrayInputStream(schemaJson.getBytes())
                        )
                        .read();
            } catch (Exception e) {
                logger.error("", e);
            }
            return null;
        }
    }
}
leadpony commented 5 years ago

Hello @rconnacher and @chadlankford Thank you many. @rconnacher, the method JsonResolutionTest.resolveSchema() in the code above seems to return an instance of JsonSchemaResolver instread of JsonSchema. Is this the correct example?

rconnacher commented 5 years ago

@chadlankford Thanks! Your example generalizes and reinforces what I've learned from @leadpony's AbstractConformanceTest.

rconnacher commented 5 years ago

@leadpony You're right. Casting to a JsonSchemaResolver was a mistake on my part.

Mine's not quite working yet (mixing reader methods resulted in random "Unexpected char 0" parsing errors thrown by the glassfish JsonParser when the JsonSchemaReader looks for a next event). I'll report back once I figure it out.

Thanks again!

leadpony commented 5 years ago

Hello @rconnacher JsonParser in the JSON Processing API created with a single parameter of InputStream will automatically detect the character encoding of the given stream from UTF-8, UTF-16, and UTF-32 with or without BOM. If the character encoding of your remote schema is neither of these, e.g. ISO 8859-1, the parser will fail to work correctly. You can explicitly specify the character encoding of the remote schema as the second parameter of JsonSchemaReaderFactory.createSchemaReader().