voxpupuli / json-schema

Ruby JSON Schema Validator
MIT License
1.54k stars 243 forks source link

v3.x: Use string keys during validation. Add flag to stringify passed data #245

Closed RST-J closed 4 months ago

RST-J commented 9 years ago

This changes the validator to internally always rely on string keys during validation and relates to #243. It also adds a flag to convert keys and values in given data to strings.

RST-J commented 9 years ago

Actually it doesn't introduce but changes how only string keys are used internally (see comment in #243).

Before, data was stringified for validation while the original data remained as is. Now data wouldn't be stringified for validation unless explicitly requested. This transformation would affect the passed in data so that is stringified after a validation attempt no matter whether it was successful or not - which I do not really like.

iainbeeston commented 9 years ago

I'm a bit confused. In this branch stringify no longer stringify ws the hash, but parses it. I can't see where it's actually stringified any more?

iainbeeston commented 9 years ago

I like the changes, except I don't understand how stringify works if it just parses the data passed. Does that not mean that the input data will be converted to a Json schema object? And does it mean that we'll parse the same input hash multiple times?

RST-J commented 9 years ago

The stringify method does not simply parse the data, it first converts it to json: JSON::Validator.parse(schema.to_json). As far as I know this does exactly one thing - convert symbol keys and values to string. If there is anything else that I wasn't aware of, I'd go back to the old implementation. Do you know about another case?

Stringify is not only related to schemas and was also used to stringify given in data before. I think it is in the wrong location, maybe it should go into some Util module but I had no idea where to place it exactly so I left it where it is for the moment.

As far as I can see we won't parse the same input multiple times. Stringify is used once at the beginning of schema initialization which makes sure that everything in schemas is converted to string keys/values and once to initialize data which also made sure, that every symbol key and value is a string during validation (this is what I meant by we already had that without the code in the attributes being aware of that).

iainbeeston commented 9 years ago

Ah I see what stringify is doing now. However, taking a hash, serialising it to a string, then parsing it back again seems like a bad idea, performance-wise. Would it not be better to fix the existing stringify function to convert any symbol to a string?

Sorry, I was wrong about stringify being called twice. However, could we avoid calling it in the initializer? In master, the only place we modify the data is in initialize_data. This makes it easy to test and to reason about. It also keeps the initializer small, with no real work being done until validation begins. It's a point of style but I think it's worth keeping the separation that we currently have (if possible).

iainbeeston commented 9 years ago

I've been thinking more about this (see #246) and now I think we probably should aim to remove inserting defaults in 3.0. It wouldn't be hard for users to insert defaults themselves using either ActiveSupport's deep_merge or maybe Map, and by letting them insert the defaults we avoid any ambiguity (plus we don't have to maintain that code and can focus on validation). Does that sound ok?

iainbeeston commented 9 years ago

Oh, and I'd also suggest we stop stringifying keys, and just expect that the hash we're given is a json hash (ie. that only contains datatypes that are available in json). This would also simplify the codebase, and it's not hard for users to stringify for themselves (possibly by calling to_json then parse as you suggested).

In both cases we could probably add instructions to the readme to explain how to do it without json-schema.

RST-J commented 9 years ago

I agree, its not hard to stringify for the users themselves. However, as we have this routine already, we could also keep it and make it available through JSON::Validator.stringify or similar. But it would be up to the user, to do that before validation, the validation itself wouldn't do that anymore. For me, the reason to keep our implementation (not my replacement) would be that it also converts symbol values to strings.

iainbeeston commented 9 years ago

You know, activesupport's as_json method is basically a superior version of this. It's defined on a number of base classes (Array, Hash, Symbol etc) and it recursively converts all symbols to strings, both symbol keys and symbol values. What's more, many more complex classes already implement as_json. I'd be tempted to say we should advise users to use that, as it will always have better support than anything we can implement ourselves. On Sun, 15 Mar 2015 at 1:08 pm, Jonas Peschla notifications@github.com wrote:

I agree, its not hard to stringify for the users themselves. However, as we have this routine already, we could also keep it and make it available through JSON::Validator.stringify or similar. But it would be up to the user, to do that before validation, the validation itself wouldn't do that anymore. The reason to keep our implementation is, that it also converts symbol values to strings.

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/245#issuecomment-81001246 .

RST-J commented 9 years ago

Ah, cool. No, I didn't know about this. But then I take back what I said and vote for the opposite ;).

iainbeeston commented 9 years ago

I only realised today! (I've even written code before that's includes symbols in the result of as_json! Whoops!)

I like the other changes in this PR though. I wonder if we could remove stringification then update all tests to use strings instead of symbols (might be a big job). On Sun, 15 Mar 2015 at 2:55 pm, Jonas Peschla notifications@github.com wrote:

Ah, cool. No, I didn't know about this. But then I take back what I said and vote for the opposite ;).

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/245#issuecomment-81062616 .

RST-J commented 9 years ago

Should we drop the :parse_data option as well then? I think so. Especially when we also drop support for insertion of defaults we have no need to initialize given data at all.

iainbeeston commented 9 years ago

Yes I agree, that makes sense On Sun, 15 Mar 2015 at 4:51 pm, Jonas Peschla notifications@github.com wrote:

Should we drop the :parse_data option as well then? I think so. Especially when we also drop support for insertion of defaults we have no need to initialize given data at all.

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/245#issuecomment-81152802 .

RST-J commented 9 years ago

Done, I think. What to write into the README? Or do we want to rethink the README and start putting more detailed stuff into the wiki, as @hoxworth suggested before?

iainbeeston commented 9 years ago

I think add something into the readme now. It doesn't have to be perfect (we can revise the whole readme later), but at least we can't forget if we add something now On Sun, 15 Mar 2015 at 5:33 pm, Jonas Peschla notifications@github.com wrote:

Done, I think. What to write into the README? Or do we want to rethink the README and start putting more detailed stuff into the wiki, as @hoxworth https://github.com/hoxworth suggested before?

— Reply to this email directly or view it on GitHub https://github.com/ruby-json-schema/json-schema/pull/245#issuecomment-81173744 .

TheMeier commented 4 months ago

Closing due to inactivity. Please re-open if this is still relevant