json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.69k stars 259 forks source link

Validation: Improving date-time: min/max, linking & step #99

Closed handrews closed 7 years ago

handrews commented 7 years ago

Carried over from json-schema/json-schema#227 originally filed by @Lcfvs

In @Lcfvs 's words:

[This proposes] a min/max equivalent for date-time fields (min/max, begin/end, ...).

It may be useful to indicate a date time range, like for a reservation vs availability, an event beginning/end.

Imho, it may be also useful to link it to an another date-time field, with a positive/negative step.

Relequestual commented 7 years ago

I think that this functionality may lead to confusion. Specifically in terms of what the purpose of this field would be I can see a situation where a system puts a min date of todays date when the json-schema document is generated (on the fly, every time). I really hope we can avoid that issue completely by using a standard time base interval system.

I'd rather see a min / max values either be date specific OR relative duration. ISO 8601 is something we could use (https://en.wikipedia.org/wiki/ISO_8601). It includes duration.

If we did allow either, the spec should be clear that the date should be fixed, and not generated at the time of schema document access.

handrews commented 7 years ago

the spec should be clear that the date should be fixed, and not generated at the time of schema document access.

Using the same ID for different schemas each time they are accessed through that ID is clearly against the spec. The point of an ID is that you can use it to reliably look up a schema and do not need to dereference it, so it can't change at runtime. That has nothing to do with date fields.

I'm reading this proposal as requesting minDateTime and maxDateTime or perhaps startDateTime and duration. Either way, the problem here is that date-time is only supported via format, which means it's effectively unsupported if you want to produce schemas that will work identically on any implementation. Unlike the other min/max keywords, there are no required-to-be-supported keywords on which to base these.

Relequestual commented 7 years ago

ID in this context?

Sorry, I simply can't understand either part of your reply. Could you rephrase? (may be because I'm a bit jetlagged.)

handrews commented 7 years ago

Sorry, I simply can't understand either part of your reply. Could you rephrase? (may be because I'm a bit jetlagged.)

I dunno, I seem to be cursed on this project. I've never had such a high percentage of (multiple) people not having any idea what I'm saying or getting caught up in minor side issues :-/ When it happens with everyone else it must be me... anyway...

You talked about different schemas (in the form of different time constraints) being "generated at the time of schema document access. By "document access" I assume that this means being looked up based on an identifying URI. Possibly in the profile media type parameter, possibly in a "$ref". You may fetch it based on that URI or you may retrieve it from local storage, again based on the URI. (ID == this URI I'm talking about)

If retrieved over the network, the server could re-generate it with different time values each time, but would be serving it from the same identifying URI. This would produce different documents identified by the same URI, which is bad, because schemas should be locally cacheable. You are explicitly not supposed to automatically download them from the profile location (the one corresponding to the identifying URI).

So even though there are plenty of dynamic resources on the web, schemas should not be among them.

And of course, if loading from local storage, there's no way for the originating schema publisher to re-generate anything.

Does that make more sense? I may have just been misinterpreting you in the first place.

Relequestual commented 7 years ago

That makese sense. And yes, that was kinda my point, that doing so would be a bad thing, and it might be perceived that this COULD be ok, and allowing dates to be releative would be useful to prevent this. I would have just said json schema document rather than ID...

My suggestion is that the field is either a fixed date (via iso standard), or a duration (again defined by standard). I think the use cases for duration will often be more useful. Any thoughts on use cases for fixed min max datetimes?

handrews commented 7 years ago

I'm still a little confused on why you feel the need to specify "fixed"? Dynamic generation of schemas is a problem (or not) independent of dates. Or am I missing something that makes this more susceptible?

Relequestual commented 7 years ago

My worry is that, if the spec says you put a fixed date in this field, that people might believe they should be generating the schema document dynamically. Say I want to only allow a field to be dates in the future when the json-payload is submitted (I'm thinking my use case here is a json-schema for data posting via an API). If I only want dates in the future at time of posting, I would want to be able to specify a minDateTime of today. As you said, you DON'T want to suggest that the json-schema should be dynamic, and as such, enabling a way to specify today as a standard encoded agreed on format is important.

I think part of our problem on understanding each other is assumptions about the actual use case. I wouldn't worry much though, as long as you can explain when anyone else asks for clarification =] Contributions are always welcome! So thanks!

awwright commented 7 years ago

Does anyone here actually have a favorable view of this feature?

handrews commented 7 years ago

My worry is that, if the spec says you put a fixed date in this field, that people might believe they should be generating the schema document dynamically.

I think our confusion is just that I don't see this as any more or less likely than, say, changing the specification of how long an array can be based on something or other (number of days left in the current month, whatever). We're on the same page about the problem, I just don't see this as making that problem any more or less likely. Your motivating use case is clear, but I'm sure we could dig some up for other keywords.

handrews commented 7 years ago

@awwright I like the proposal, my only concern is that it depends on something that is only supported by format, which I view as useless because it has no reliable function (hence #54 )

handrews commented 7 years ago

If we promoted date-time to a first-class type, then I would be entirely in favor of this. As long as it's an unenforceable format on a string, there's not much point in layering anything on top of that.

handrews commented 7 years ago

@Relequestual a blanket "don't dynamically generate schemas under the same identifying URI" statement would probably be a good idea even if this proposal isn't accepted. Either in the spec or in a best practices section on the web site.

awwright commented 7 years ago

I'm not sure what the proposal is even saying.

It's certainly possible to add keywords that validate strings, but testing "Assert the date is between 2016 and 2017" doesn't make any sense. That's not structural in any way.

handrews commented 7 years ago

It's certainly possible to add keywords that validate strings, but testing "Assert the date is between 2016 and 2017" doesn't make any sense. That's not structural in any way.

How is it less structural than "assert that the number is between 1 and 10" or "assert that the string is at least 6 characters long"?

awwright commented 7 years ago

Number ranges and string lengths are often asserting that the data fits in the allocated storage space (e.g. a uint64) or makes sense (that a required string isn't zero characters long, or matches the 'uuid' pattern).

Doing this for a date interpretation of a string doesn't make any sense, we can already check the string length, and schemas shouldn't be bound to a particular time.

If you're checking for data consistency (e.g. "the blog post's publish-date is after the author's register-date"), that's out of scope of JSON Schema Validation.

handrews commented 7 years ago

Doing this for a date interpretation of a string doesn't make any sense, we can already check the string length, and schemas shouldn't be bound to a particular time.

Yes, that's what I mean about the fact that date-time is only implemented by format being a problem for this proposal. If we are set on date-times just being strings, then there's nothing that will really make this work. So we are in agreement.

As a side note, I've never understood why date-time is only a format. It's a fundamental type in many systems. But I'm not interested in debating changing that here (or probably ever- too many more important things that are getting enough resistance as it is).

If you're checking for data consistency (e.g. "the blog post's publish-date is after the author's register-date"), that's out of scope of JSON Schema Validation.

I don't know why you and @Relequestual keep bringing this up. Of course it's outside of the spec. I don't see anything here that's trying to bring it in.

But whatever, unless we're going to support date-time as a real type (and I've never seen any evidence of interest in that), this proposal should be rejected. format plus the string type just isn't enough to base it on.

Relequestual commented 7 years ago

I don't know what you mean by real type. RFC https://tools.ietf.org/html/rfc3339 is enough, and that's as documented. I specifically dislike trying to tie such things down beyond this, as you end up with language specific issues, like what you have with Swagger.

To be fair, I don't think I or @awwright brought up validation of one datetime against another field (which actually I don't think that is nesseserily out of scope). I think realy we don't want to do so much type validation and in stead focus on fixing structure issues.

I actually like this proposal.

handrews commented 7 years ago

I don't know what you mean by real type.

I mean two things:

  1. It's reliably validated. format is effectively useless for validation unless you control all implementations in the system. Again #54 would be a very easy way to ensure some level of minimum usefulness but it has not gotten a lot of interest. We don't reliably do anything with dates except verify that they're strings.
  2. I'm not sure how to phrase this one more clearly. Even though "number" and "integer" are the same fundamental JSON type, we allow specifying them separately in schema because the restriction of being an "integer" is interesting in terms of what we can do and what sort of math we can use with it. Some (most) languages into which JSON is read make this distinction, others do not. Similarly, date-times are often either fundamental types (such as in SQL) or supported by core libraries (most everywhere else) because unlike most strings, you can do math on them.

That's the whole point of having a date-time, and that's why this proposal exists- you can do math on date-times so you can have the concept of a minimum and maximum. That is a concept that does not make sense for strings. Since the specification forbids depending on format (which would verify RFC3339 compliance), JSON Schema prevents us from relying on such fields as anything other than strings.

RFC3339 is pointless if the specification prevents you from being able to rely on it being enforced. That is, fundamentally, what I mean when I say that JSON Schema does not have a real type for date-time. It would still be a JSON string underneath, the same way that JSON Schema integers are JSON numbers underneath.

Relequestual commented 7 years ago

I think you're confused... or I'm extreamly confused.

It sounds like you have a problem that JSON cannot express a datetime outside of a string. That is not a JSON Schema issue. Json has numbers or strings. JSON Schema allows for number and integer, as you expressed in your comment.

The JSON Schema spec says the string must conform to RFC3339 format. A library should validate the string conforms as specified in the RFC. That string should be convertable into whatever date time object type the language supports.

7.3.1.2. Validation

A string instance is valid against this attribute if it is a valid date representation as defined by RFC 3339, section 5.6 [RFC3339].

json-schema.org/latest/json-schema-validation.html

What are you asking form beyond that? Sounds like validation beyond "is it a string" to me...

awwright commented 7 years ago

I'm not sure where this is going, can someone interested in this feature post a use case with examples?

On Oct 18, 2016 18:26, "Ben Hutton" notifications@github.com wrote:

I think you're confused... or I'm extreamly confused.

It sounds like you have a problem that JSON cannot express a datetime outside of a string. That is not a JSON Schema issue. Json has numbers or strings. JSON Schema allows for number and integer, as you expressed in your comment.

The JSON Schema spec says the string must conform to RFC3339 format. A library should validate the string conforms as specified in the RFC. That string should be convertable into whatever date time object type the language supports.

7.3.1.2. Validation

A string instance is valid against this attribute if it is a valid date representation as defined by RFC 3339, section 5.6 [RFC3339].

What are you asking form beyond that? Sounds like validation beyond "is it a string" to me...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/json-schema-org/json-schema-spec/issues/99#issuecomment-254685511, or mute the thread https://github.com/notifications/unsubscribe-auth/AAatDSDeOWm18kGUCJhTkMDCngfiaz9Fks5q1XHRgaJpZM4KYKHL .

handrews commented 7 years ago

@Relequestual 7.3.1.2 is perfectly fine, and if it were a mandatory part of the specification then that would be great. But it's not, per section 7.2:

Save for agreement between parties, schema authors SHALL NOT expect a peer implementation to support this keyword [format] and/or custom format attributes.

Which means that if I want my schema to be usable in the wild, I cannot count on format being implemented at all, so I cannot count on my strings that are designated as having "format": "date-time" being validated against RFC3339. That is what I mean by date-time not being a "real" type. The specification explicitly says that I MUST NOT rely on anyone else having any date-time awareness at all. In the wild, format doesn't count.

handrews commented 7 years ago

@awwright a use case would be to restrict a date-time to be within the Unix epoch.

awwright commented 7 years ago

I think "format" was intended as a metadata keyword, not a validation keyword, hence the behavior there. We should consider making it mandatory.

In any event, the RFC3339 date format isn't that hard to provide as a regexp.

handrews commented 7 years ago

In any event, the RFC3339 date format isn't that hard to provide as a regex.

This is the point of #54 :-)

clenk commented 7 years ago

@awwright I have a use case where I want to have an object with created and modified properties, and ensure that modified is a later timestamp than created. I think this could be achieved with this proposal in conjunction with #51.

Example:

{
    "type": "object",
    "properties": {
        "created": {
            "type": "string",
            "format": "date-time"
        },
        "modified": {
            "type": "string",
            "format": "date-time",
            "minimum": {"$data": "1/created"}
        }
    },
    "required": ["created", "modified"]
}
Lcfvs commented 7 years ago

Thanks for imported my proposal here, thought a lost cause :D

I agree the standard interval idea... but, please, don't forget the step, like in an HTML5 form (it's why I originally said min, max & step)

;)

awwright commented 7 years ago

@clenk Can you provide an example of this feature used by itself? We can't assume the existence of any other features that aren't already there, and JSON Schema Validation isn't for checking data consistency.

epoberezkin commented 7 years ago

@clenk there is this proposal: https://github.com/json-schema/json-schema/wiki/formatMinimum-(v5-proposal) that covers this use case. I implemented in ajv.

There is an example there too.

Why don't we just transfer that wiki over here?

awwright commented 7 years ago

We can modify this issue or open a new issue. We're not using the wiki for proposals anymore, and everything else can go on the website.

epoberezkin commented 7 years ago

so we need to move them to issues then... Can we at least link to the wiki from issues?

handrews commented 7 years ago

@epoberezkin I've been moving them all over and I do link them in both directions. The same for issues on the old repo.

I've slowed down because the only people who could close out the old things refuse to spend time on it even though I made a nice list of what to do with each in a google doc :-( I've been trying to see if I can get the original filers to close them instead, with limited success.

epoberezkin commented 7 years ago

Ok, I will find time to move over the things I care about. Can you share the doc here maybe (or point where it is)?

clenk commented 7 years ago

@awwright An example without other features: representing movies in JSON and disallowing a movie with a 'release-date' property before the invention of film. Or in a related domain, a schema to ensure video games have a release date after the release of their system.

An example using both min and max: for JSON representing battles and deaths in a war, a schema could ensure that they happened after the war started and before the war ended.

Also, XML Schema's date datatypes allow for maxExclusive, maxInclusive, minExclusive, and minInclusive. An example can be found here. I do not mean to imply that JSON Schema should do something just because XML Schema did it, only that there is precedent of another schema language doing something like this.

handrews commented 7 years ago

@epoberezkin here's the list I have. Some may already be closed. I just went through and put a note at the end of every issue that didn't already have one to refer people here (except maybe some that you filed since I know you know what's going on). https://docs.google.com/spreadsheets/d/1pcT638kI7vrY4MjWxgMKUVpjuwceC9EtRuCcHldzB5c/edit#gid=0

awwright commented 7 years ago

So ok, sanity checking is a good case. For example, making sure an instance isn't 0000-00-00 00:00:00 or 0000-01-01 00:00:00 or 1970-00-00 00:00:00 or another obviously wrong value.

This might be better done with formatMinimum/formatMaximum, though? That just feels like a more useful keyword in general. Can we do that, @clenk, @Lcfvs ?

Lcfvs commented 7 years ago

@awwright It's a good option... but I wonder if the format property would not be an object, for future extensibility.

Moreover, it doesn't "pollute" the parent scope.

I think to something like this:

{
  "format": {
    "type": "datetime",
    "min": "1970-00-00 00:00:00",
    "max": "2020-00-00 00:00:00",
    "step": 60,
    // futures
  }
}

What do you think about it?

epoberezkin commented 7 years ago

@Lcfvs seems more verbose to me

Lcfvs commented 7 years ago

@epoberezkin Yeah, maybe but it opens the way to define some complex format options (for datetime but not only)

handrews commented 7 years ago

@Lcfvs JSON Schema has generally gone for multiple keywords with a common suffix rather than nested keywords. This allows for reasoning about how a schema works by just looking at a single property set.

In any event, let's have any general format extension conversations over in #116.

Since you seem to be OK with folding this into a more general format change, I'm going to close this issue in favor of #116. If you would prefer to keep this open separately please just comment and I can re-open it.

danieladuarteng commented 5 years ago

@awwright Eu tenho um caso de uso em que desejo ter um objeto com createde modifiedpropriedades e garantir que modifiedseja um registro de data e hora posterior a created. Eu acho que isso poderia ser alcançado com esta proposta em conjunto com o nº 51 .

Exemplo:

{
    "type": "object",
    "properties": {
        "created": {
            "type": "string",
            "format": "date-time"
        },
        "modified": {
            "type": "string",
            "format": "date-time",
            "minimum": {"$data": "1/created"}
        }
    },
    "required": ["created", "modified"]
}

A question, where you declare the $data?