Partial validation & tying additional validations to partial validations

andrewdeandrade commented 10 years ago

I'm still pretty new to JSON schema, but looking around at the other JSON Schema validation libraries out there, I have yet to find one that will support my use case. However judging by the z-schema API, it appears that you may be the most open to the ideas I have in mind were I to fork and submit pull requests.

Basically, I'm looking to add support for two features:

(1) the ability to register additional non-JSON-schema validations. For example, when validating an npm package.json file, it's easy to define a validation that checks if a string is a valid path. However, I would also like to be able to register a validation that checks if those paths actually exist on disk relative to the location of the package.json file itself. Another example would be checking if a homepage URL doesn't return a 404 not found error.

I reckon this feature would be implementable by allowing you to register pre and post validation functions that receive the value of the property being validated and returning true if additional validations succeed or errors otherwise. These functions would need access to the context in which they are called (so that they can get access via closures to details such as the directory in which a package.json exists).

(2) partial validation for performance reasons. For example, if I have already loaded a JSON into memory and validated it on load, I don't want to completely revalidate it on write. Instead I'd like the ability to only validate the properties that I am modifying.

validator.validate(partialJSON, "lib://package.json#/definitions/dependency-map", function(err, result) {
  if (err) { /* ..  handle err ... */ }
  /* ... add partialJSON to full JSON and write to disk ... */
});

or alternatively schemas could have method support for getting definitions, like so:

compiledSchema.getDefinition("dependency-map")

or a validatePartial method could be added to ZSchema like so:

validator.validate(partialJSON, schema, definitionName, function(err, result) {
  if (err) { /* ..  handle err ... */ }
  /* ... add partialJSON to full JSON and write to disk ... */
});

For this second feature the following features would be valuable:

Referencing schemas by URI or filepath directly (internally, z-schema would load and parse the schema into JSON)
resolveSchemaQuery falls back to checking the ZSchema instance schemaCache ( proof of concept: https://github.com/malandrew/z-schema/tree/resolveSchemaQuery-schemaCache-fallback )

As far as I can tell, suggestion two seems reasonable, but the first suggestion might be better as a separate module built on top of your validator or another one. However, to build feature (1) on top, it would be immensely useful to have partial validation.

I just started looking through the source and I'm still thinking through how these might be implementable, but I wanted to get your thoughts on them. Would you be open to accepting pull requests for these features?

zaggino commented 10 years ago

With (2) you probably found a bug in the implementation, because schemaCache should have been used in the place where you put it, but somewhere I've forgotten to do that. This would make a good PR if none of the tests are failing.

(1) looks like a good candidate for API enhancements - you can already use format attribute and define your own sync and async formats. You should explore what is and isn't possible there but I can imagine that your result schema would be something like:

{
    type: "string",
    format: "existOnDisk"
}

andrewdeandrade commented 10 years ago

There is one failing test in my branch. The arity of resolveSchemaQuery is now at 6 which is not allowed under the current JSHint rules. Either the rule needs to be relaxed or the function signature needs to be modified to collapse multiple lesser used arguments into one object.

The problem with implementing the existOnDisk approach is twofold. First, there is no contextual information available for the test to execute since the function is declared and registered once for all validator instances. e.g. given an array of relative paths (which is how all paths are declared in a package.json), how do you check if it exists on disk since you lack enough information to resolve the absolute path. Second, its kind of incompatible with JSON-schema since it involves more information than can be encoded and stored in a pure JSON structure (both the schema and the JSON being validated).

Another use I wanted for this API enhancement for would be to be able to use node-semver for advanced validations unencodeable in a string. However long term, many of the validations I'd want to do would be highly dependent on additional contextual information.

zaggino commented 10 years ago

Feel free to change that to 6 in .jshintrc, it's a lot but it's fine.

We should be able to make an API to provide more information for validator functions and then that information would be available when custom format validator is executed.

zaggino commented 10 years ago

How about you create two pull requests? One to fix (2) and second one with just a test case for (1) and I'll see if I'm able to implement the APIs to fit your testcase.

andrewdeandrade commented 10 years ago

So most of the day tuesday and wednesday, I tried to get my head z-schema's internals by refactoring parts to make it more functional. I did this just for fun because I find it to be one of the best ways to get my head around everything the code is doing, however I found the stack traces from bluebird to be impenetrable even with the long stack-traces library longjohn in use.

Anyways, after struggling with all the promises stuff and trying to find a way to clean up all the sync/promise branching everywhere, I decided to try out jjv that now passes all the json-schema tests and is sufficiently fast in sync mode to not matter and supports both of my use cases, so I will use it instead.

That being said, here's a link to the refactors I was playing with in an attempt to DRY up the code as much as possible. It's only half of what I would liked to try out and I reckon you're going to reject most of the ideas I tried out, but feel free to look over it to see if there is anything you like there that you might want to implement:

https://github.com/malandrew/z-schema/blob/refactor/src/ZSchema.js (caveat emptor, it is slower than the current implementation, but I made a lot of changes without benchmarking as I went along, so I'm unsure which changes slowed it down)

PS Out of curiosity, why didn't you include underscore or lo-dash as a dependency? Enough of the functions used are in both libraries to warrant adding it as a dependency, considering you already have two other dependencies.

zaggino commented 10 years ago

It's an evolution thing. First it was without dependencies which was the way I wanted it... then came functionality to download remotes and suddenly request was required. You can't really do async without promises or async lib so then bluebird came into play. Never had real time to rewrite it using lodash but I'd surely do it if I had time for that.

If I will have some time in the future, I'll get rid of sync stuff and rewrite it purely async with lodash, but I'm currently too busy with other projects which have bigger priority for me and are more widely used then this validator.

andrewdeandrade commented 10 years ago

I totally hear ya. I'm working on a bunch of stuff that I need to aggressively rework.

What do you think about making all the errors into a library? jjv's author went with his approach because he wan't a machine readable error result. I liked that approach because it left room to build another library on top that converts the machine readable error object into human readable errors. I also don't have time now but was thinking of looking at the work you've done with the readable errors with error codes and the work tv4 has done with readable errors and seeing if a general errors library can be made that works with jjv from the work you've done. Besides liking your error handling approach I really appreciated seeing a solid use of the benchmark module. I'm going to look into that in more detail later. I'll keep you posted if I make a human readable error handling library in case it helps you simplify your code. It will probably be based on some of z-schema's code.

FWIW, If you haven't yet seen it, I came across this excellent article today comparing async methods out there when looking further into promises and generators:

http://spion.github.io/posts/analysis-generators-and-other-async-patterns-node.html

zaggino commented 10 years ago

Idea for error handling library sounds great, although I guess it'd be tied to a particular validator implementation or you'd need to standardize a path others would follow.

I have read quite a few articles about es6 stuff (not this one, thanks for it) but I haven't thought yet about writing an es6 only library. It might be a good idea for version 3.0 if that ever happens to be a reality, but I'm currently trying to focus my free time on brackets editor.

andrewdeandrade commented 10 years ago

Yup, completely agree. Anyways, I'm in the same boat as you. A nice error handling library is low priority for me as well right now. Good luck on the brackets editor.

Following that article to others, here are some other great relevant reads that I hadn't seen before: http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony http://blog.ometer.com/2011/07/24/callbacks-synchronous-and-asynchronous/

zaggino commented 10 years ago

Version 3.0.0 is out, without any dependencies and including support for both sync and async formats so basically any validation is possible, even checking the filesystem in node.

Some testcases related to this: https://github.com/zaggino/z-schema/blob/master/test/ZSchemaTestSuite/CustomFormats.js https://github.com/zaggino/z-schema/blob/master/test/ZSchemaTestSuite/CustomFormatsAsync.js

zaggino / z-schema

Partial validation & tying additional validations to partial validations #32