json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.76k stars 264 forks source link

"deprecation" property #74

Closed awwright closed 5 years ago

awwright commented 8 years ago

Create a schema keyword that specifies an instance (property in an instance, usually) as deprecated -- it shouldn't be used, but exists only for internal use, or for reverse compatibility with obsolete or old products.

handrews commented 8 years ago

+1 to this. The point here, I assume, is for documentation and perhaps to allow implementations to issue warnings, rather than to affect validation.

Relequestual commented 7 years ago

Interesting proposal.

I know that JSON Schema isn't specifically for API's, but we should consider how API's might version their endpoints. Could this proposal actually be part of a seperate doc about API driven information. For example, included in deprecated, how it's deprecated and if it's replaced, and possibly other upgrade path type data. Very much a straw man comment, not well thought through.

handrews commented 7 years ago

@Relequestual I'm not sure how far we want to chase the versioning idea here, but my approach to API evolution (rather than versioning) is to only change the version of the schema, and retain compatibility by (as much as possible) avoiding both required and "additionalProperties": false. I use the following rules:

Relequestual commented 7 years ago

Interesting. Most of the time our API (and others I've worked on) have needed "new versions" because of new fields or new allowed values. Sometimes that's backwards compatible, but often it isn't for me, as much as I'd like it to be.

handrews commented 7 years ago

As long as you avoid setting additionalProperties to false, then new fields should be fine (unless you're generating strictly typed code off of the schemas without any provision for unrecognized fields, but I avoid that in favor of more generic client libraries).

Enumerations get tricky, but that's where the "if it still validates you can still use it" rule comes into play- technically the new version is not fully compatible but most instances will validate against either version.

My approach to API evolution is to do as much as possible to allow users to keep using the API as they have been unless their specific usage is broken by the update. It requires some design discipline, and the willingness to add a new resource as a replacement and deprecate the old instead of doing a major version change on the entire API and breaking a bunch of clients that don't actually care one way or the other.

handrews commented 7 years ago

Admittedly, I haven't actually had a chance to prove that this can work long-term, but it's something I'm trying with the current project I'm working on.

Relequestual commented 7 years ago

I think Protocol Buffers allows for flagging deprecation. If someone has time, may be worth seeing what that system allows for.

awwright commented 7 years ago

173 is a sufficient solution for me, so I'll close this out in favor of the PR

handrews commented 7 years ago

@awwright please don't close issues while the corresponding PR is still open. PRs are not always accepted.

awwright commented 7 years ago

@handrews A PR is just a type of issue on GitHub, and I can indefinitely re-edit it if necessary, or re-open this issue of mine, or someone can file a new issue for me if it really comes down to that

handrews commented 7 years ago

@awwright This is yet another instance of you randomly choosing how to use GitHub in whatever ways that suit you, no matter whether anyone else finds it confusing. Some issues you want to keep open, some you don't, we just get to guess at what you'll complain about, and if we want to keep track of things in some consistent way, we're out of luck if you decide not to.

All of the discussion about wanting a clear and defined process that the community has had clearly means nothing to you. Your response to any complaint or request for clarity is always just "well I see it this way so that's what I'm doing".

awwright commented 7 years ago

@handrews It's my issue! Anyone is allowed to close their own issue for any reason. There's nothing special about it.

Relequestual commented 7 years ago

I'll re-open this issue as I agree issues that do not have accepted pull requests should not be closed.

Relequestual commented 7 years ago

It seems like there's quite a lot of discussion and decisions to be made on this issue. Specifcially the comments in the PR review: https://github.com/json-schema-org/json-schema-spec/pull/173#discussion_r91867794

It's clear there are a number of unresolved abiguities.

In a root schema with a "self" link is means that the resource itself is deprecated? But it's still there currently?

In a root schema without a "self" link it means....?

In a property subschema (in any of the property keywords) it means that all matching properties are deprecated but will still be present?

In a schema applying to a single position in an array/tuple ("items" is an array) it means...?

In a schema applying to multiple array positions ("items" as one schema, or "additionalItems") it means...?

In the schema for "schema" or "targetSchema" it means...?

This concept is not ready for a PR. There are far too many cases that need discussing and a coherent view of hypermedia resources that need to be developed before we can just toss this in.

Relequestual commented 7 years ago

I've set this as priority: low. People aren't crying out for it, and it doesn't fix an existing bug. It would be nice to go in the next release if the above issues can be resolved, but otherwise it shouldn't become a blocker.

discordier commented 7 years ago

I just ran into the same wall. I want to denote in a schema that a property will get removed in a later version. So currently I have a schema with id: https://example.org/schemas/1.0/payload-schema.json. Now we are releasing the next version adding new properties and types named https://example.org/schemas/1.1/payload-schema.json but also want to deprecate old stuff which got superseeded by the new types to be ultimately get removed in https://example.org/schemas/2.0/payload-schema.json (long term future).

Simply as all document types evolve over time you need a migration phase (1.x schema series in above example). So we want all 1.x documents to validate against all 1.x schemas. This is possible, as pointed out by @handrews above when "avoiding both required and "additionalProperties": false, all types are preserved and only non mandatory new stuff gets added (See also SemVer). Now we also want validation to break when the deprecated stuff is validated against the 2.x schema.

All this is currently possible from validation POV but we lack a way to tell the user of the schema (developer) about the migration phase. Of course this can be handled by defining proprietary keys, used by whatever validator one is using or attaching a new validator to validate twice, but this will cause fragmentation of standards. Therefore defining this right in the spec would be better - or as @Relequestual stated "nice to have".

Yet, I wonder how big need is as next to no one seems to feel the need (with me being the second guy in this thread apparently). The fact that XML XSDs also lack a standardized way to define something like this makes me wonder if the schema really is the right place to add this but I fail to see a better alternative (which can be utilized by CI and yet not break production).

handrews commented 7 years ago

@discordier we've generally had trouble finding anyone who has even used hyper-schema enough to participate in discussions. So don't take the low count as necessarily indicating lack of support. I encourage you to comment on any hyper-schema issues and pull requests you see- it is a challenge for us to decide on directions when only two or three people have opinions they are comfortable expressing. Here are the issues and PRs (including but not limited to hyper-schema) that we're targeting for the next draft: https://github.com/json-schema-org/json-schema-spec/milestone/2

@Relequestual marked this low priority because we're trying to quickly resolve enough things to publish Draft 6 and this is both not essential for that immediate effort and because there are several questions without obvious answers. I'd personally be in favor of getting this into Draft 7, though, if we can't sort it out in time for Draft 6.

I'm only familiar with XSD in the most basic sense, but I gather it's more analogous to JSON Schema Validation rather than JSON Hyper-Schema. I definitely think this feature belongs in hyper-schema assuming we can sort out the details.

lidio601 commented 7 years ago

+1

Relequestual commented 7 years ago

@lidio601 Please avoid +1 comments. You can thumbs up any comment you agree with. Thanks.

brendajin commented 7 years ago

I would also like a deprecation system, or some type of meta-tagging for fields that are being added, replaced, and deprecated. As others mentioned before, it should have a date to accompany it.

philsturgeon commented 7 years ago

Howdy! Where did this conversation end up? I know deprecations are something that Open API v3.0 has a little of, and they're considering improvements for v3.1.

It would be amazing to see some collaboration on this issue, so we can help reduce the now very narrow gap between JSON Schema and Open API v3.x.

handrews commented 7 years ago

@philsturgeon everyone here has been kind of caught up in other things, but there are signs of life and we do hope to have another draft out before the current ones expire in October.

The deprecation idea is popular so I hope it makes it into the next draft.

The last PR stalled because there were many ambiguities and we did not have the time and focus to resolve them. The use cases I see (which may not be what anyone else agrees with) are:

The semantics of deprecation and array elements are unclear, so I'd avoid that on the first pass. If use cases emerge we can work through them.

Deprecating a schema that is not an object field schema has no clear semantics to me. In my opinion, only LDOs and fields schemas are reasonable use cases for deprecation (although I may be missing some obvious other ones).

awwright commented 7 years ago

We may be running into an issue similar to "required" where it was decided that it doesn't make sense as a boolean, but only as a list of properties on an object, so it was turned into an array listing key names that are required.

Maybe the same thing is true of "deprecated", but I'd also be interested in providing a rationale as to why the property is deprecated, maybe at least a link to a document.

I vaguely remember a proposal to add a link relation doing exactly this, pointing to a document that explains how the origin resource is deprecated, but I can't find any evidence of such a thing, maybe I invented that.

philsturgeon commented 7 years ago

Absolutely. So this is exactly the same conversation we're having over on @dret's Sunset header draft.

Outstanding questions of where to send humans and computers are being discussed here.

Sunset will take a date, and optionally a URL to some metadata, and thats about it.

Some people like versions, some people like dates, but for Sunset it's going date only at this point. Versions are too app-specific for a header, but JSON Schema could probably have some opinions on version perhaps. Not sure.

Either way, theres some crossover here.

dret commented 7 years ago

just to clarify: Sunset as currently defined (feedback very welcome) only takes a date and nothing else:

Sunset = HTTP-date

https://github.com/dret/I-D/issues/60#issuecomment-319739804 mentions the possibility of linking to additional information using a link relation, but that's not part of the current draft.

handrews commented 7 years ago

@dret @philsturgeon I'd prefer a date over a version because it's more flexible. Some APIs use evolution techniques rather than versions, and that's my preferred approach. I'll have to catch up on the Sunset discussion. I like having link relations for things but I haven't read the rationales for or against yet.

@awwright Agreed on this probably making more sense as a list parallel to required. "What about this object is deprecated?" would be a lot easier to reason about than "is this schema part of a thing that can be deprecated and if so is it?"

It looks like the Sunset header and an LDO-level deprecated would work together, while a field-level deprecated would be a different thing.

dret commented 7 years ago

On 2017-08-03 09:25, Henry Andrews wrote:

@dret https://github.com/dret @philsturgeon @awwright https://github.com/awwright Agreed on this probably making more sense as a list parallel to |required|. "What about this object is deprecated?" would be a lot easier to reason about than "is this schema part of a thing that can be deprecated and if so is it?"

i am confused by this whole discussion. to me it looks like this:

awwright commented 7 years ago

@dret It sounds like we may have to differentiate how this is used for (1) defining expected user input, and (2) annotating the output of a JSON API. What "deprecated" means in these scenarios may be slightly different.

I've been understanding it as an API value that may not be provided in the future, and that you shouldn't use it, or should adjust your application to rely on some other value.

dret commented 7 years ago

On 2017-08-03 13:39, Austin Wright wrote:

@dret https://github.com/dret It sounds like we may have to differentiate how this is used for (1) defining expected user input, and (2) annotating the output of a JSON API. What "deprecated" means in these scenarios may be slightly different.

and there are many different ways in which these differences can manifest, which makes it hard to capture them in a predefined model.

I've been understanding it as an API value that may not be provided in the future, and that you shouldn't use it, or should adjust your application to rely on some other value.

so what makes it different from something that's optional? other than you might start labeling it "optional" not from the very start, but at some point in time during the lifetime of your API?

handrews commented 7 years ago

@dret I prefer to avoid declaring anything as required and build my tooling appropriately. However, declaring fields to be required and then removing them later is an extremely common real-world occurrence, no matter how I feel about it as a design choice.

dret commented 7 years ago

On 2017-08-03 17:04, Henry Andrews wrote:

@dret https://github.com/dret I prefer to avoid declaring anything as required and build my tooling appropriately. However, declaring fields to be required and then removing them later is an extremely common real-world occurrence, no matter how I feel about it as a design choice.

interesting. if you build things your way, there's no need to communicate deprecation: everything can go away anytime anyway, and clients have to be prepared for that. afaict, that makes it hard to build applications that are based on some minimal core contract, but if that's something that works for you, it seems that you are all set!

handrews commented 7 years ago

if that's something that works for you, it seems that you are all set!

In practice, I doubt I can manage it all the time, but it's something I am going to explore and see how much I can get away with :-)

Ideally, the only reason to drop a field is that it is legitimately no longer relevant. My preference is that optional fields are always absent when not relevant, never present with a null value. If clients are built to facilitate working with fields that may be absent, then a certain category of "removed" fields are no longer a problem.

Obviously some fields can't work that way. If you're configuring something related to DNS, you need the domain name. Otherwise nothing makes sense at all. But there are other things that are a lot less critical. Hopefully over the next six months to a year I'll be able to get some real-world validation (or invalidation) of various ideas in this area.

adamvoss commented 7 years ago

Is this hyper-schema specific? Or rather, is it out of scope for JSON Schema?

I know I have encountered at least one case where configuration settings have changed with Visual Studio Code (which provides a schema for its configuration file) where they potentially could have captured the change in a deprecation message rather than leaving users wondering why the file no longer validated.

handrews commented 7 years ago

This, readOnly, and the media object all at least theoretically are usable outside of Hyper-Schema. media could arguably be viewed as extended validation (in the same way as format, in that it conveys information that could be but need not necessarily be validated in full).

Alternatively, Hyper-Schema is not limited to APIs. This is one of the key points that confuse people about Hyper-Schema. It is a hypermedia format, not an API description format. So just like people often use hypermedia formats like HTML (or even more commonly, Markdown) to store local documents, there's no reason you couldn't use a hyper-schema with Visual Studio configuration settings.

adamvoss commented 7 years ago

there's no reason you couldn't use a hyper-schema with Visual Studio configuration settings.

I must be missing something. Where do the semantics come from?

For example, if I am validating a config (or other) file against a schema and the file uses a deprecated property I think the validator SHOULD emit a warning about the deprecated property. If "deprecated" is only part of hyper-schema, how would a JSON Schema validator know to issue a warning?

handrews commented 7 years ago

You would obviously need a hyper-schema-aware implementation, or at least one that makes it easy to register the keywords you care about as extension keywords.

Hopefully with draft-06 and beyond, hyper-schema implementations will become more common. My work in that area got kind of put on hold for a while but I am getting back to it. But the most popular validators tend to allow extension keywords.

We also haven't specified what an implementation SHOULD do, and would probably leave it fairly open. Some systems may want it more as a documentation thing, others may want runtime behavior. This is similar to default, which is why many implementations ignore it by, um.. default :-) but provide options to write missing defaults into the instance during validation. Implementations can always layer functionality on top of the spec.

adamvoss commented 7 years ago

I think that is a valid view. Though, I would throw in a word of caution about extensions and fragmentation, in that one of the advantages of JSON Schema as I see it is that (on the whole) it is well-defined and compatible. Something that is implemented as extensions (and then ones that are loosely defined with respect to behavior), makes me hesitant to use it unless I really need the functionality because the number of implementations supporting my schema suddenly drop and the number that support it but no not behave how I intend may be non-zero.

In response to my original question though, it sounds like it is considered out of scope for JSON Schema.

dret commented 7 years ago

In practice, I doubt I can manage it all the time, but it's something I am going to explore and see how much I can get away with :-)

keep us posted.

Ideally, the only reason to drop a field is that it is legitimately no longer relevant. My preference is that optional fields are always absent when not relevant, never present with a |null| value. If clients are built to facilitate working with fields that may be absent, then a certain category of "removed" fields are no longer a problem.

agreed in principle. in practice, it is very hard to build anything meaningful when you can count on nothing. my guidance is almost opposite to yours: design and define a stable core and commit to it, allowing clients to rely on that core.

http://dret.typepad.com/dretblog/2016/04/robust-extensibility.html

Obviously some fields can't work that way. If you're configuring something related to DNS, you need the domain name. Otherwise nothing makes sense at all. But there are other things that are a lot less critical. Hopefully over the next six months to a year I'll be able to get some real-world validation (or invalidation) of various ideas in this area.

like i said, keep us posted. the art of defining and evolving APIs is definitely something that will only become more important as we see things becoming increasingly connected. how to do it well and robustly is important, and a space where we need more experience and guidance.

handrews commented 7 years ago

OK, trying to push this towards a new concrete proposal:

Going from @awwright's observations that this is more like "required" than anything else, I see two options for the object field case:

As with #117 ("patternRequired") we would likely need "patternDeprecated" (and "additionalDeprecated"?). See also #364 (applying this object-level approach to "readOnly").

I slightly lean towards the object form despite probably needing more keywords, as it then parallels the Sunset header approach of using dates (assuming @dret is still going in that direction).

For the link/resource case, I'd like to try to handle that through #296 (link usage hints / target attributes) which could then piggy-back on the Sunset work (assuming @dret thinks that is valid). While there would not be a Sunset header specified for all protocols / URI schemes, I think there is a general question to be answered in #296 about re-using useful hints outside of their original specification.

I feel like we should be able to make a decision on the object field approach and get that to PR stage, and either handle the link/resource case in #296 or start a new issue to track it on its own.

As for exactly how this is specified for required vs optional fields, I think that will be more easily discussed given a PR with actual wording. We don't entirely nail down what required and non-required fields actually mean (is required a permanent commitment? it depends on the implementation, and is outside of JSON Schema's scope).

Thoughts?

handrews commented 7 years ago

For such a simple concept, this turns out to be difficult to get right (two PRs by two different contributors have now been retracted).

@adamvoss brought up the excellent idea of a deprecationMessage functionality of some sort.

Looking at the OpenAPI proposals that @philsturgeon noted, while a lot of their operation-based approach doesn't translate well to hyper-schema, there are other useful concepts like indicating a replacement.

So I wanted to bring this back for a bit more open discussion.

I will comment again soon with specific ideas, although anyone should feel free to jump in on this of course.

philsturgeon commented 7 years ago

When I was talking to @dret about the Sunset header (for entire-endpoint deprecations), we came to the agreement that probably a message was not entirely necessary.

Instead a link is provided via Links with a rel=sunset, and that is either a URL to the new endpoint to use, or a link to some documentation or a blog. Whatever that link is, it will be a lot more useful than whatever was jammed into the reason, because for entire endpoints.... well do you care?

Human written messages like "We love our customers very much, and decided to listen to their feedback on some of the..." are no good, and the other end of the spectrum is just "Bugs and app updates", neither of which is useful.

Instead the link approach allows clients to build a simple formatted message:

"Endpoint #{env.url} is deprecated for removal on #{datetime.iso8601}. Read more at #{sunset_link}"

This may not be the way to go for JSON Schema. I see JSON Schema and GraphQL as having a lot in common, and they do offer a reason:

https://github.com/graphql/graphql-js/pull/384

This adding of a reason here could easily outline for humans that foo is deprecated and the reason is just: "Bar is better because X".

I feel like #391 is headed in the right direction, but would adding deprecated and deprecatedMessage get too noisy?

If not, can we go a little further and add a deprecatedUntil or something? The dates to me really are more important than the human words.

handrews commented 7 years ago

@philsturgeon I like the link relation for per-resource deprecation. Assuming PR #383 "targetHints" is accepted, that would then look something like:

{
    "rel": "some=old-thing",
    "href": "path/to/old/thing",
    "targetHints": {
        "sunset": [
            "Sat, 31 Dec 2018 23:59:59 GMT"
        ],
        "link": [
            ["/uri/path/to/sunset/resource", {"rel": "sunset"}]
        ]
    }
}
philsturgeon commented 7 years ago

Maybe I confused things by bringing up sunset, im not sure how targetHints work and dunno if I'd want to overload the term Sunset with stuff.

Are targetHints designed to hint towards potential response headers, or are the request header orientated? I see in the PR they talk about Allow (req) so these resp headers seem out of place in this form.

handrews commented 7 years ago

@philsturgeon maybe read the targetHints PR? It includes other examples :-) [EDIT: or maybe I should wait to be snarky until after checking on confusing mistakes in my own PR that you already read... oops, my apologies, maybe I should actually read comments]

But briefly: "targetHints" are basically JSON-serialized response headers. Like "targetSchema" and "mediaType" they are non-authoritative. Schema authors can provide them as an optimization (and in the case of HTTP, as a convenience to avoid having to figure out whether something is discoverable over HEAD vs OPTIONS).

So that example just means "if you did a HEAD on this, you would very likely get a Sunset header with this date/time, and a Link header with this URI and relation type."

handrews commented 7 years ago

Request headers are described with schemas instead of values, and are covered in the aptly-named headerSchema PR #390

handrews commented 7 years ago

@philsturgeon

I see in the PR they talk about Allow (req) so these resp headers seem out of place in this form.

Unless I am really confused Allow is a response header? Admittedly, I did confuse Accept and Content-Type the other day, so it's possible that I'm really confused...

philsturgeon commented 7 years ago

Unless I am really confused Allow is a response header?

Gah sorry brain frazzled. I did the same thing and read Allow as Accept and got messed up. Ok, let me do my homework on targetHints and see if it lines up with the right sorta thing.

targetHints may well solve endpoint deprecations (letting you know its likley before you hit it and find out) but regular old JSON Schema still needs to have deprecation, message and date to let folks know a field is about to vanish too. Right?

handrews commented 7 years ago

Gah sorry brain frazzled.

well that clearly makes two of us today! :-D

regular old JSON Schema still needs to have deprecation, message and date to let folks know a field is about to vanish too. Right?

yes, and that's the main focus of this issue and both of the PRs that attempted to address it. We'd brought up resource-level deprecation as well, but certainly for draft-07, I'd like to give "targetHints" a chance to solve as many problems as it can with the generic mechanism, and only design specific approaches when that fails. There's a very larger set of things that can be conveyed in response headers and arguing over each individually hasn't really gotten us far.

Anyways, yes, field deprecation needs a totally separate mechanism.

handrews commented 7 years ago

As for ideas on per-field deprecation, I see a few options:

  1. Decide it's complicated enough to call for a sub-object. This would allow for things like clearly distinguishing between date vs version-based deprecation, and providing information about replacements in a structured way:
{
  "properties": {
    "foo": true,
    "bar": true
  },
  "patternProperties": {
    "^biz": true
  },
  "deprecated": {
    "foo": {"date": "2017-12-01", "message": "...", "replacedBy": "??not sure what to put here"}
  },
  "patternDeprecated": {
    "^biz.*stuff$": {"version": "2.0", "reason": "..."}
  }
}
  1. Use a tuple, where the first item is the boolean-or-string field described in PR #391 and the optional second field is the message. This avoids nested keywords (which we generally avoid) but still allows additional information:
{
  "properties": {
    "foo": true,
    "bar": true
  },
  "patternProperties": {
    "^biz": true
  },
  "deprecated": {
    "foo": ["2017-12-01", "message of some sort"],
    "bar": ["2018-01-01"]
  },
  "patternDeprecated": {
    "^biz.*stuff$": ["2.0", "message for this one"]
  }
}
  1. Do we want to reconsider a sub-object in the context of all of the annotation fields? Should there be an annotation structure? I've just barely started asking this question so no specifics yet.
levbishop commented 7 years ago

The #173 proposal, where deprecated applies to a sub-schema made sense to me and I don't understand the objections. The obvious interpretation is that any matches to a sub-schema with "deprecated": true should be interpreted as "this JSON file is valid according to the current schema, but will no longer be valid in a future revision of the schema". This allows deprecation due to changes due any restriction that is expressable in json-schema. In the most complex cases (eg, changes to dependencies, changes to regexes in patternProperties) it might be necessary to do something like:

"oneOf": [ { <new_schema> }, {"deprecated": true, <old_schema>}]

or if and aren't easily made mutually exclusive:

"oneOf": [ {<new_schema}, {"deprecated": true, "not": {<new_schema>}, <old_schema>}

It would be easier for schema authors (and probably it's no extra burden on implementations) to allow use of anyOf and non-mutually-exclusive schemata here, with a rule to suppress deprecation warnings unless all matching branches are deprecated.

In either case, once the deprecation period is over and the old syntax is dropped, it's a simple update to the schema of deleting the branches with "deprecated": true and removing any anyOf that may have been rendered superfluous.

Common changes, such as disallowing a previously-allowed field can be done very simply, by adding "deprecated": true to the relevant place, without any need for oneOf/anyOf.

Other deprecation-related fields deprecationDate, deprecationMessage, deprecationURL, etc if decided necessary can live at the same level as deprecated and the scope is obvious.

Unlike the #173 proposal, a #391 approach seems limited and ill-defined. For example what is the equivalent of:

{
  "anyOf": [ 
    { "type": "integer", "minimum": 0}, 
    { "deprecated": true, "type": "number"}
  ]
}

(Ie, a previously unconstrained numeric field, where being other than a non-negative integer is now deprecated)