jsonrainbow / json-schema

PHP implementation of JSON schema. Fork of the http://jsonschemaphpv.sourceforge.net/ project
MIT License
3.55k stars 354 forks source link

Two possible improvements to type coercion #379

Closed gtuccini closed 7 years ago

gtuccini commented 7 years ago

It would be useful to have a coercible string representation for null. "null" or '' would work just fine.

I think also that the coercion should happen only when the subject doesn't match any of the allowed types. Currently, given the following subject

{
    "value": "true"
}

and the following schema

{
    "properties": {
        "value": {
            "type": ["boolean", "string"]
        }
    },
    "type": "object"
}

"value" is coerced to boolean, even if its type -- string -- is among the allowed ones.

What do you think?

erayd commented 7 years ago

@shmax Boom!

Does this meet your deadline?

shmax commented 7 years ago

Wha?! Actually, I was hoping it would take all night so I could go to karaoke and maybe get some pizza. But let's take a look-see...

erayd commented 7 years ago

I haven't done the additional coercions from the ajv matrix yet - working on that now. But the bug should be fixed, and there's a new test-case that checks for it.

shmax commented 7 years ago

I still don't necessarily agree that we have a bug, here. Let's just say that the behavior has changed as agreed, okay?

erayd commented 7 years ago

OK - I have changed the title. How's that?

erayd commented 7 years ago

Have also updated the description.

shmax commented 7 years ago

I don't know how typical my project is, but I just did a little accounting, and out of 151 schema files I could only find two instances where I have an array of types and one of the possible values is a string:

"reissueOf" => [
    "type" => [
        "number",
    "string"
    ],
    "description" => "Id of some other sku that this toy is a reissue of, if any."
],

In this case what I'm validating is a numeric id for a record in a MYSQL table, so it really doesn't make any difference if it gets coerced or not. As a matter of fact, there's no reason to keep the "string" option in there. So no big deal in this case.

Just listing one real use case. Would be cool to see others.

gtuccini commented 7 years ago

I have a lot of schemata like this.

{
    "properties": {
        "code": {
            "default": null,
            "format": "taxCode",
            "type": ["null", "string"]
        },
        "columns": {
            "default": ["name", "surname"],
            "items": {
                "enum": ["code", "id", "name", "surname"]
            },
            "type": "array"
        },
        "direction": {
            "default": "ASC",
            "enum": ["ASC", "DESC"]
        },
        "id": {
            "default": null,
            "type": ["integer", "null"]
        },
        "limit": {
            "default": 100,
            "enum": [10, 100, 1000],
            "type": "integer"
        },
        "name": {
            "default": null,
            "type": ["null", "string"]
        },
        "offset": {
            "default": 0,
            "type": "integer"
        },
        "orderBy": {
            "default": "surname",
            "enum": ["code", "id", "name", "surname"]
        },
        "surname": {
            "default": null,
            "type": ["null", "string"]
        }
    },
    "required": ["code", "columns", "id", "limit", "name", "offset", "orderBy", "surname"],
    "type": "object"
}

The subject comes from the query string, which is generated by a GET form.

Currently I'm just converting all the empty strings to null prior to performing the validation. To use the new features I'd need either a "typePreference" attribute or the equivalent of an "if":

"name": {
    "anyOf": [
        {
            "minLength": 1,
            "type": "string"
        },
        {
            "type": "null"
        }
    ]
}

or, through a more general construct:

"name": {
    "anyOf": [
        {
            "not": {
                "enum": [""]
            },
            "type": "string"
        },
        {
            "type": "null"
        }
    ]
}

Am I right?

shmax commented 7 years ago

That is a bit troubling. We started by providing the ability to coerce empty string to null, but then our second change rendered the first change moot. At least for this use case, it sounds like you would have been happier without the second change. I'm wondering if we should have taken your earlier suggestion to have something along the lines of a CHECK_MODE_EARLY_COERCE option. I have to head to work, but will think more about this when I get a chance.

shmax commented 7 years ago

Oh, I do have one question: why bother specifying the "null" type for your string? When you're processing your data after validation, empty string and null are equivalent when it comes to a truthiness check, so I don't see any reason to want null, and if you keep your type as "string" you can do concatenation and other operations in good faith without having to check for non-null...

gtuccini commented 7 years ago

The value eventually ends in the arguments of a prepared statement which uses "IS NULL", so its type is relevant. I'll give you some more informations as soon as I'm be back from work.

shmax commented 7 years ago

I see. And changing the prepared statement to do a truthiness check is not an option?

where :value is null or :value = ''
gtuccini commented 7 years ago

Unfortunately I'm not in charge of writing the code which interacts with the db :). Anyway, even if it is a bit cumbersome, I think that I can live with writing those "ifs". At least, now that the implementation is aligned with ajv, I can thoroughly coerce and validate the query parameters both on the client and the server using just the schema and I can reliably predict the outcome.

erayd commented 7 years ago

@shmax @gtuccini Having read the above, I think that maybe an early / opportunistic coerce flag is in order. Such cases are hopefully rare, but @gtuccini won't be the only one to run into this kind of issue, so having an option to obtain the previous behavior when asked for is probably a good thing.

I feel very strongly that the not-coercing-valid-values is the right thing to be doing, but as long as it's the default, I see no issue with also supporting the old behavior (behind a flag).

@shmax - would you consider it an acceptable solution if I added such a flag to the current PR? Or would you like to discuss further before deciding what to do?

shmax commented 7 years ago

@shmax - would you consider it an acceptable solution if I added such a flag to the current PR? Or would you like to discuss further before deciding what to do?

Sounds good to me. While you're at it and totally on fire, what do you think about breaking up the coercive tests into smaller, more meaningful chunks?

erayd commented 7 years ago

@shmax Done to both :-).

@gtuccini Is the flag sufficient for your purposes?

erayd commented 7 years ago

@shmax I've been thinking about it, and I'm concerned that implementing something like typePreference would be bad for the integrity of the standard, as it would encourage people to use our non-portable extension of it. On the other hand, we already have type coercion and default-setting, neither of which are universally portable or part of the standard (although they are both implemented by more validators than just us, and neither requires a custom keyword).

Thoughts?

shmax commented 7 years ago

Agreed. I don't think we should be adding new schema keywords.

gtuccini commented 7 years ago

@erayd The flag is definitely enough and, all in all, I like it better than a non standard keyword. Thanks for getting this done :)

However, I think that a standard "typePreference" (or rather "defaultType") keyword would be useful to support type coercion and to inform the ui about the input widget to show by default when multiple types are allowed. Would you find it useful if I proposed the keyword (and, more broadly, to add some recommendations about type coercion) at https://github.com/json-schema-org/json-schema-spec?

erayd commented 7 years ago

@gtuccini Having a defaultType or similar keyword (maybe preferredType) as part of the standard is definitely something I like the idea of :-). Maybe propose it for possible inclusion in draft-07?

It's not something I want to push - I have enough on my plate already - but if you want to take ownership of that and propose it, by all means go ahead :-). I'm quite happy to implement it if they include it in the spec.

erayd commented 7 years ago

@bighappyface Can this issue be closed? The fix is merged in 6.0.0-dev, and will not be backported to avoid compatibility-breaking changes.