jsonrainbow / json-schema

PHP implementation of JSON schema. Fork of the http://jsonschemaphpv.sourceforge.net/ project
MIT License
3.55k stars 357 forks source link

Should validate without passing schema to method validate #541

Open estahn opened 6 years ago

estahn commented 6 years ago

I would expect to receive some failures given the following JSON document:

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#"
}

and code

$validator = new JsonSchema\Validator;
$validator->validate($data);
var_dump($validator->getErrors());

But it is reporting as valid. The JSON document itself contains the schema and I shouldn't need to pass it in.

erayd commented 6 years ago

The JSON schema spec doesn't allow for any special treatment of particular keys within a document instance, not even $schema. While this key is special within a schema, you can't use it the way you are trying to here.

The behavior is correct - this issue is user error.

estahn commented 6 years ago

@erayd Thanks, so it's expected behaviour, gotcha. Then I don't understand why the schema for validate is optional. It appears counterintuitive.

As for your argument in regards to JSON Schema please see here: https://snowplowanalytics.com/blog/2014/05/15/introducing-self-describing-jsons/#sdjs

While the JSON schema spec doesn't make an explicit statement for the use of $schema in JSONs it also doesn't prohibit it.

Maybe it could be added as a feature flag ENABLE_SELF_DESCR_JSON.

Thoughts?

erayd commented 6 years ago

@estahn There was a huge discussion regarding this in the actual spec group about a year ago - the outcome, if I recall correctly, was that privileging keywords in a document instance is hugely problematic, because it makes assumptions about user data that are not guaranteed to be correct. It also opens an exploit vector. The spec itself doesn't explicitly prohibit it, because it's completely out of scope. However, 'unspecified' doesn't mean 'acceptable'.

Noting that it's extremely easy to implement this yourself as a preprocessor in just a few lines of code, I'm not happy to add such a feature to this project.

I don't recall why the schema argument for validate is optional, but it's likely either for backwards compatibility with something (I can't think what that might be though), or a mistake. I'm happy for that argument to become compulsory unless someone else has an objection to that.

estahn commented 6 years ago

@erayd Thanks for your response and I do see your point.

Is a preprocessor something you see as part of the project or something that should be done outside? I do have another use-case I could use that for. phpunit-json-assertions makes use of justinrainbow/json-schema for validation and as part of that I use the SchemaStorage to map URLs to local files, e.g.

        $directory = new \RecursiveDirectoryIterator(realpath(__DIR__ . '/../../schemas/au.com.domain.events'));
        $flattened = new \RecursiveIteratorIterator($directory);
        $files = new \RegexIterator($flattened, '/^(?<schema>.*?\/schemas\/(?<namespace>(.+?))\/(?<name>.+?)\/jsonschema\/(?<version>\d+-\d+-\d+))$/i', \RecursiveRegexIterator::GET_MATCH);

        foreach ($files as $file) {
            $jsonObject = json_decode(file_get_contents($file['schema']));
            $storage->addSchema($jsonObject->id, $jsonObject);
        }

A preprocessor would allow to attach that behaviour in a nicer way.

$validator = new JsonSchema\Validator;
$validator->addPreprocessor(new JsonSchemaUriToFile());
$validator->addPreprocessor(new JsonSchemaSelfDescr());
$validator->validate($data);
var_dump($validator->getErrors());

On a second thought it may not be a concern of the validator.

erayd commented 6 years ago

@estahn I see the actual preprocessor as being out of scope for this project. However, I'm open to including a general preprocessor-attachment interface of some kind.

@maximbaz @bighappyface What do you guys think?

maximbaz commented 6 years ago

I think you meant to ping someone else instead of me 😄 Not familiar enough with the project to give you meaningful feedback on this matter.

erayd commented 6 years ago

@maximbaz Ooops, sorry!

@shmax - That was supposed to be you...

bighappyface commented 6 years ago

Noting that it's extremely easy to implement this yourself as a preprocessor in just a few lines of code, I'm not happy to add such a feature to this project.

👏

I think working out individual needs is the burden of the individual.

shmax commented 6 years ago

@shmax - That was supposed to be you...

I'll defer, as I don't fully understand the issue. Five minutes ago I didn't know the library supported this feature, but the optional $schema argument can be explained by the code I see in SchemaConstraint. Given that bit of logic, he does seem to have a point; inline schemas are apparently supported, only we expect the inline schema to be an object or an array, and aren't prepared for the possibility that it's a uri.

But it does seem like a huge hole, and I think it's kind of nuts that it's there at all.

erayd commented 6 years ago

@shmax Oh man, that's horrible! Thanks for finding that.

I'll do some more digging and see if I can figure out why we're doing this at all (because history maybe?) - IMO we probably shouldn't be (or at least not by default) and v6 is a good time to address that.