json-schema-org / json-schema-spec

The JSON Schema specification
http://json-schema.org/
Other
3.65k stars 257 forks source link

Creation only defined via collections and POST in Hyper-Schema #581

Closed davidarnold closed 5 years ago

davidarnold commented 6 years ago

The current draft spec describes creation of new resources via submissionSchema on collection resources, noting that in HTTP this would correspond to POST. This is in line with the discussions that occurred in #47 and #48 as well as the precedent that AtomPub set.

However, REST is not CRUD and more specifically, the circa-2007 bijective mapping of C, R, U, D to POST, GET, PUT, and DELETE is flawed.

HTTP/1.1 does define a method with creation semantics and it is PUT. POST is defined as processing according to the resource's own semantics, with a nod to the very common practice of using it for adding or appending to a collection.

So, although it is well within the right of specs like AtomPub to define resource-specific POST creation semantics, it is not the canonical method in the HTTP uniform interface to create new resources.

Use cases

PUT creation of resources under a namespace

In my specific use case, we have several URI namespaces that permit creation of new resources. For example, creation of users is accomplished by performing a PUT of an appropriate JSON object to a URI template of /user/{user_id} where user_id is a client-generated UUID. I have been struggling to find a standard which would communicate to API clients that this operation is permissible.

Under the current Hyper-Schema draft, it seems I would have to convert /user from a hierarchical URI namespace into a first-class collection resource and change the interaction pattern to be a POST to that new resource instead of a direct PUT to the final URI of the user.

Given that /user is currently not a valid resource in our hierarchy, and for security reasons cannot list all users, it seems odd that I would have to create a permanently empty resource just to house a shell collection that can anchor the creation semantics. Also, switching to POST would destroy the ability of clients to use PUT as a hassle-free "upsert" mechanism, instead having to decide or remember whether they have already created the user and change their approach if so.

PUT creation of non-collection sub-resources

In a more hypothetical example (since the last one could trigger debate over the collection model), you can foresee examples where a resource may permit creation of a sub-resource that is not a collection at all. In other words, the sub-resource has 0..1 cardinality instead of 0..*

Suppose there is an API providing access to scholarly articles. Each article is available under /article/{article_id} and may contain an optional sub-resource for its bibliography under /article/{article_id}/bibliography. Also, it may provide free access to a curated summary under /article/{article_id}/summary.

It makes sense that a new bibliography or summary could be created on the site via direct PUT to the appropriate URI. It is also entirely unclear how this situation could be contorted to use collection patterns to achieve the goal.

Proposed solution

I found the use of targetSchema in the draft spec odd, considering that it is earmarked specifically for use in retrieval, replacement, and patching, all of which are more authoritatively described by the resource itself. In fact, the document admits that the targetSchema is advisory in nature and that client applications are free to completely ignore it.

However, targetSchema makes perfect sense to me if its presence is intended to indicate that the link, regardless of its rel type, admits creation using the designated schema object. Since the resource on the other end of the link may not exist yet, it cannot link to a schema document that describes how to create it. The burden has to be on the referring resource to describe how to create its related resources.

Another benefit is that this creation schema is free to diverge slightly from the resource's primary schema to serve the common scenario of creation data excluding certain properties (e.g. system timestamps).

Since I am ignorant to the context on why a duplicative and non-authoritative property would exist outside of creation, I would tentatively propose that the existence of targetSchema should convey an explicit creation semantic, and be discouraged for use in other scenarios.

Retrieval and replacement are better served by the authoritative schema linked by the live representation of the current state of the resource, patching would require its own schema (and possibly its own media type), and as the draft states, targetSchema is nonsensical with regard to deletion.

In conclusion, targetSchema as an enabler of arbitrary resource creation would allow Hyper-Schema to produce level 3 RMM REST applications in a non-intrusive and discoverable way, without relying on industry conventions or aging design patterns.

handrews commented 6 years ago

@davidarnold thanks for the feedback!

Hyper-Schema already works the way you want it to. Per §8.2 '"targetSchema" and HTTP, targetSchema is used for PUT. This include PUTs that create (and, as targetSchema plus the patching media type describes the use of PATCH, it is used for PATCHes that create as well). Since RFCs 7231 and 5789 already cover the semantics of PUT and PATCH, we do not restate every aspect of them in Hyper-Schema.

The special semantics of POST with a link relation type of "collection" are necessary to allow code to recognize the indirect creation with server-assigned URI use case. POST is the only HTTP method capable of supporting this mechanism. I personally prefer natural, client-selectable keys with PUT whenever possible, but Hyper-Schema needs to support the most common use cases out there, including create-via-collection-with-POST.

But none of that makes create via PUT or PATCH any less supported with targetSchema. It is "non-authoritative" in the sense that resource may reject a request that validates against targetSchema for whatever reason, including that its acceptable format changed between you fetching the schema and making the request. It's non-authoritative in that sense, not in a "you shouldn't use it" sense. Just "you should be prepared for dynamic hypermedia systems to be dynamic".

To get the most authoritative way to interact with a resource, you can always GET (or HEAD) it and get its schema, and use the "self" link which is how the resource provides its own schema as targetSchema.

As far as whether any particular resource supports create-via-PUT, that works the same was as ever in HTTP. Try it and see if you get a 201. In my view, any system that supports client-determinable URIs and supports PUT at all should support create via PUT (and not support indirect create via POST), but that's an API design best practice and outside the scope of this specification.

So, aside from maybe a word or two of clarification, I don't think there's anything to do here.

davidarnold commented 6 years ago

Thanks for the thorough response! I am 95% in agreement with everything you stated.

Since RFCs 7231 and 5789 already cover the semantics of PUT and PATCH, we do not restate every aspect of them in Hyper-Schema.

Understandable, but every mention of PUT in the document seems to specifically exclude create. It was consistent enough that it led me to believe this was intentional.

I separately searched for "PUT" and "create" and they never met in the document. The documentation on PUT always mentions only "replace" and any discussion of create is always in the context of collection + POST semantics.

All PUT references (emphasis mine):

6.6.4.2. submissionSchema

including for replacing the contents of the resource in a PUT request

8.2. "targetSchema" and HTTP

In particular, "targetSchema" suggests [...] what a client application should send if it replaces the resource in an HTTP PUT request.

All creation references:

Each one of those is in specific relation to collections.

So, aside from maybe a word or two of clarification, I don't think there's anything to do here.

Maybe :) I think it would go a long way to say "create or replace" in the references to PUT above.

However, I'm not sure I agree with this:

As far as whether any particular resource supports create-via-PUT, that works the same was as ever in HTTP. Try it and see if you get a 201.

I don't think the discovery process of REST is intended to include speculatively operating methods with side effects. Discoverability to me should be restricted to crawling links (GET), augmented with any other "safe" methods (like HEAD and OPTIONS).

As an example, if I am writing a client application for a merchandise API, I shouldn't have to try to place an order to discover whether a resource supports order creation.

I am looking for a stronger indicator in the schema that would lead an agent to reasonably believe that creation via PUT is permitted. Certainly, the operation could ultimately fail for other reasons (insufficient payment or credentials, etc), but the basic communication that creation attempts are expected should be manifest.

As a litmus test, I am imagining how one would design a generic user agent for browsing Hyper-Schema conforming APIs. Reading, updating, and deleting data are all very obvious from following links and examining the discovered resource's OPTIONS. What is still difficult is creation of new resources.

Displaying a creation form for any link that includes a targetSchema and allowing a user to fill it out, when there is no indication that the service supports that at all, seems incorrect.

There are two scenarios that might be sufficient for creation semantics:

  1. If a URI template for a link includes an unresolved expression that requires user input, and that link also includes a targetSchema
  2. If a URI template is fully resolved and includes a targetSchema, but following it results in a 404

These both feel a bit tenuous to me, though, particularly scenario 2.

May I propose an additional set of properties creationSchema and creationMediaType? This does have a several benefits:

What do you think?

handrews commented 6 years ago

I think it would go a long way to say "create or replace" in the references to PUT above.

Yeah, that's pretty reasonable. I'll take a look at it for the forthcoming draft and put some references in. I may not add it in every single place, depending on how the wording flows, but we can definitely make it explicit and I'll at-mention you on the PR when I get to it. Which might be this afternoon or might be next month, life's a little complicated right now :-)


Regarding providing an additional affordance for creation, I do not view this quite the same way, but it's more a consideration of layering and separation of responsibilities. There are some rather epic conversations going on across multiple slack workspaces on this topic right now. There's an invite to the json-schema slack at the bottom of the json-schema.org page, btw, and I encourage you to join.

TL;DR: Hyper-Schema, as of now, documents the existence of links, and the structure of three types of associated data:

Additionally, it provides two spaces (targetHints and headerSchema, possibly to be renamed metaDataHints and metaDataSchema in draft-08, see #566) for optimizing protocol use without directly tying Hyper-Schema to any particular protocol.

There is another layer to a fully-functional system based on REST, which is application-level semantic affordances. My belief is that these belong at a higher, separate level from what is currently in hyper-schema. I am working on some approaches to include such things in Hyper-Schema without breaking the layering (I'm not being intentionally vague, like I said there are epic slack conversations going on around this and summarizing them here is impossible- it's too much and moving too quickly)

Some of what you're saying around create* keywords is kinda similar, but not entirely, and I'm still trying to come up with a way to explain what I mean that doesn't get shot down six different ways.

As for attempting a PUT, targetHints can already be used to inform you that PUT is supported. In the absence of a "collection" link, it (or PATCH, but very few people PATCH to create) is the only clear option for creation. A PUT to a link that 404s on GET is no less safe than a POST to a collection link. A PUT that refuses because it only allows update and not create should convey that via something like RFC 7807 HTTP Problem Details.

However, given a clear solution to the higher-level application affordances, that may all become moot so don't focus on it too much.


The POST-to-create stuff is an awkwardness due to the mismatch between the design conventions used in most REST-ish APIs (for better or worse- we're dealing with reality to the extent that it doesn't blatantly violate REST, and POST-to-create does not violate REST) and the uniform interface provided by HTTP.

As we hammer out the higher-level stuff hopefully that can become less of a weird special case. I feel like the whole submissionSchema thing kind of awkwardly crosses levels in a way that was inherited from HTML forms and is not useful for HTTP APIs, but (so far) I tend to be in the minority view there.


Anyway, let's keep this issue for just fixing the wording to be inclusive of create-via-PUT/PATCH. If you'd like to continue responding to the rest of this, please join the slack channel and I'd love to keep talking there.

awwright commented 6 years ago

I just wanted to say this is a really great thread, that covers everything I wanted to say.

handrews commented 6 years ago

@davidarnold @awwright I've been spending a lot of time thinking about the higher layer for application-level semantic affordances over the last month. I'm working up a proposal for addressing this as a new vocabulary for hypermedia application operations to supplement Hyper-Schema.

Hyper-Schema is primarily a templated serialization format for RFC 8288 Web Linking, with the templating provided by RFC 6570 URI Templates. It includes unusually comprehensive target hinting mechanisms due to the flexibility allowed by JSON and JSON Schema (even submissionSchema and submissionMediaType are a form of target hinting- a hint that the resource accepts such representations for processing, which it may decline to do at runtime for any number of reasons). It also provides features for resolving the URI Templates in various ways.

This means that it overlaps somewhat with the idea of Web Forms, particularly in the URI Template resolution aspect which provide a superset of HTML "get" forms. And the submission* keywords more or less correspond to HTML "post" forms.

But really, Hyper-Schema is not a form system, it just provides a basis for one. Resolving a template is about choosing the concrete resource from a possible set, particularly a set that cannot be explicitly enumerated. But conceptually, you could have enormous hyper-schemas that provide a separate link with the same rel for every possible resolution of the resource. That would be a pure RFC 8288 serialization rather than a templated one.

As discussed above, you can figure out what protocol-level operations are (probably) supported by looking at what schemas are present in the Link Description Object (LDO), and/or by looking at the protocol-specific hints in targetHints. But going back to why @davidarnold filed this issue, there's nothing that tells you that for a given LDO, targetSchema and PUT can be used for create. Beyond just creating the resource, there are usually many things one can do by changing the resource representation or submitting data for processing. Those "things" are application-level operations (publish document, close issue, turn on feature, etc.), and target* and submission* offer no way to indicate how many such operations can be performed through those types of data and basic protocol operations.

So I'm going to propose a vocabulary for application-level operations, probably taking the form of adding an operations or ops keyword to the LDO, which would be an array of application objects layered on top of the basic information already present in the LDO. Sometimes the link is so straightforward that there's no real point to this- if you have a link with "rel": "next", you know you can get the next whatever it is by GET-ing that resource. But many other relation types imply complex behavior.

I want to do this as a separate vocabulary because I think RFC 8288 + RFC 6570 for JSON is a very good chunk of functionality that is useful for many people on its own. And is complex enough already that I don't want to add to implementation burdens for reaching Hyper-Schema conformance. Finally, as noted above, this is a really active area of discussion across all hypermedia forums that I'm involved in, so I suspect that it will undergo some churn while getting off the ground. I'm hoping that Hyper-Schema has reached a point where it's seeing tweaks rather than further large-scale re-imaginings.

I'm looking at OpenAPI (for an operation-oriented approach, although one that is in many ways the opposite of Hyper-Schema) and ALPS (for another view on application semantics independent of protocol or media type) as inputs. Although this won't be much like either for them.

Anyway, this is a heads up in case anyone has immediate thoughts, I'll make a full proposal soon (although maybe not until I'm back from out of town after next weekend). It will probably be an issue in the json-schema-vocabularies repo, but I'll make a comment here linking to it.

handrews commented 5 years ago

For anyone who has wondered what happened with the operations vocabulary, I've been floating ideas in various slack workspaces (including ours), and while there is a lot of interest, there's also a great deal of debate and other interesting related work. We'll eventually come back to it but not all that soon, I think.

In the meantime, as previously noted I'm going to clarify the bit about PUT and create and close this once that's done.

davidarnold commented 5 years ago

In the meantime, as previously noted I'm going to clarify the bit about PUT and create and close this once that's done.

Sounds great to me.

If I wanted to contribute to the ongoing discussion, what would be the best way? Wondering if there is some place where I can catch up on the front-runner proposals before trying to add my own ideas.

handrews commented 5 years ago

@davidarnold the slack workspace is where things get kicked around. Go to json-schema.org and click on the "Discussion" tab in the upper right for the public invite link.

handrews commented 5 years ago

Right now you can basically just keep an eye on the PR list, as I'm trying to wrap up the about-to-be-published draft by the end of the year.

handrews commented 5 years ago

PR #674 merged, as previously discussed I'm closing this as the rest of it is part of a larger conversation currently best suited to the slack channels. Or, @davidarnold if you think there is a reasonably concise proposal here, please feel free to file it under its own summary so we have each single issue tracking one thing.