Guidance on implementing REST interfaces for state machine

interagent / http-api-design

HTTP API design guide extracted from work on the Heroku Platform API

https://geemus.gitbooks.io/http-api-design/content/en/

Other

13.69k stars 1.07k forks source link

Guidance on implementing REST interfaces for state machine #58

Open schatekar opened 9 years ago

schatekar commented 9 years ago

I am about to implement a business process which can be more or less modeled as a state machine. In abstract terms, here is how the state machine would look like

[Entity1] --process1--> [state1]--process2-->[state2]--process3-->[final state]

Every state transition is atomic operation. The reason we want to model it as a state machine is that complete operation can take long time to run and we do not want the caller to be blocked. So idea is that we would accept the request from caller and return 202 Accepted is request looks ok.

After that, a scheduled process would pick up the request from database and trigger process1 by calling a REST endpoint. Same for process2 and process3.

There are two ways that this can be modeled. First is implement a single REST endpoint like below

PUT http://api.com/entity/{id}/process

Because we know the current state of the entity, we can determine which process to execute next. But then I feel this API is not expressive and following would look better

PUT http://api.com/entity/{id}/process1
PUT http://api.com/entity/{id}/process2
PUT http://api.com/entity/{id}/process3

In the above, there is a distinct endpoint for every process that can be triggered on the entity. If you trigger process1 on an entity which is in final state then you would get back an error.

I am not sure which one is right or if both of these approaches are wrong. Anyone has any experience of doing something like this in past?

leandroferro commented 9 years ago

Hi!

I would model something like

POST /process

{

"entity": {id},

"transition": ...

}

This would return a 202 Accepted with an ID that you could GET /process/{ID} and check it later, as the transition will be done asynchronously.

I don't think this is better than yours, it's just a different way to model this, and you could consider things like: Does the client really need to specify what process has to be executed? Or the client just has to trigger the transition?

I think using PUT entity/{id}/processN sounds a little like RPC.

What do you think?

On Dec 19, 2014 6:00 AM, "Suhas Chatekar" notifications@github.com wrote:

I am about to implement a business process which can be more or less modeled as a state machine. In abstract terms, here is how the state machine would look like

[Entity1] --process1--> [state1]--process2-->[state2]--process3-->[final state]

Every state transition is atomic operation. The reason we want to model it as a state machine is that complete operation can take long time to run and we do not want the caller to be blocked. So idea is that we would accept the request from caller and return 202 Accepted is request looks ok.

After that, a scheduled process would pick up the request from database and trigger process1 by calling a REST endpoint. Same for process2 and process3.

There are two ways that this can be modeled. First is implement a single REST endpoint like below

PUT http://api.com/entity/{id}/process

Because we know the current state of the entity, we can determine which process to execute next. But then I feel this API is not expressive and following would look better

PUT http://api.com/entity/{id}/process1 PUT http://api.com/entity/{id}/process2 PUT http://api.com/entity/{id}/process3

In the above, there is a distinct endpoint for every process that can be triggered on the entity. If you trigger process1 on an entity which is in final state then you would get back an error.

I am not sure which one is right or if both of these approaches are wrong. Anyone has any experience of doing something like this in past?

— Reply to this email directly or view it on GitHub https://github.com/interagent/http-api-design/issues/58.

schatekar commented 9 years ago

I would think over this but a quick note on PUT vs. POST - I used PUT because I am modifying an existing entity and not create a new one.

geemus commented 9 years ago

I think I might use an actions pattern here, which we have had some luck with for other actions (though not necessarily the state machine pattern per se).

POST http://api.com/entity/{id}/actions/process1
POST http://api.com/entity/{id}/actions/process2
POST http://api.com/entity/{id}/actions/process3

This has the benefit of maintaining the same cadence (alternating resource/identifier in the path). I would probably also use 202 as you suggest to indicate it is accepted, rather than completed. You'll likely also want a way to query the status of the process (this might be as simple as looking up the entity and checking a status field, but depends on specifics of the use case).

As for put vs post, from RFC21616 I grabbed "The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line". I think we can argue that the action/process is a subordinate to the entity. Whereas PUT read as "The PUT method requests that the enclosed entity be stored under the supplied Request-URI.". I think POST then is a bit more accurate (and is more commonly used for actions in my experience).

Hope that helps, certainly happy to discuss further.

schatekar commented 9 years ago

I was originally thinking actions pattern while it expressive and clearly conveys the intent, I saw two issues with it

In order to send request on the correct URL, client needs to know the status of entity. So client has to make an extra call to know the status of the entity first.
Because client is free to form any URL using combination of entity id and process id, server needs to implement validation just in case client posted on a wrong URL e.g. entity was in state 2 but client posted on process1 which has happened already

I would look up RFC21616. That is an interesting way of looking at POST and PUT. I have been using old school definition of POST is equivalent of create a resource and PUT is equivalent of update a resource.

schatekar commented 9 years ago

@leandroferro

POST /process looks too generic. I am not sure but, I feel the resource identifier should be part of URL and not be embeded in the body. What do you think?

And I agree with your comment on RPC - but isn't that nature of REST? I always find it difficult to implement sub-operations on resources because they end up looking like RPC. For instance, if you have a customer resource and you want to disable they account then I have been doing it like PUT /Customer/{id}/disable which does look a bit awkward and RPCish.

geemus commented 9 years ago

Yeah, POST/PUT can be tricky. I end up having to refer back to the docs often to remember the specifics.

Not necessarily, or at least not in every case. You could simply try to make the status transition based on what you believe the current state is. If you are wrong the server (given proper validations), could correct you and let you know which action you should be using instead.
I think regardless that the server will probably need to validate that the correct transitions are being called for and used at the right time.

All that said, I'm not sure how easily it is to discuss generically. It might be that the best approach would be something like:

POST /entity/{id}/actions/advance

You call it repeatedly and it keeps going through the steps (202 if it isn't in the final state yet, 200 if it is). Granted this loses a lot of the subtleties, but might be appropriate in some cases.

Anything that deviates from resource/id ends up feeling weird/awkward. I think in that case I would also use actions to make it seem less disimilar at least, something like: POST /Customer/{id}/actions/disable.

schatekar commented 9 years ago

@geemus +1

I would probably have to go ahead with some of these ideas and see what results as I get. As you said there always are subtle differences in each case and it is difficult to generalize.

zdne commented 9 years ago

Dear @schatekar I really like your way of thinking! For what it's worth here are my few cents on this topic.

Modeling REST API as a Finite State Machine (FSM) is indeed the way to go. Especially when you want to create decoupled, scaleable and truly REST (read hyperlink-driven) API.

The reason we want to model it as a state machine is that complete operation can take long time to run and we do not want the caller to be blocked.

Whether this should (or should not) imply FSM I am not sure. To me FSM comes handy when modeling any REST API regardless of whether is synchronous or not.

In order to send request on the correct URL, client needs to know the status of entity. So client has to make an extra call to know the status of the entity first.

In REST, what URL and HTTP request method (endpoint / action) can be used is driven by the server, client should make no assumptions on what action is available. Instead, it should get available actions from the server and decide which one to take.

I will try to demonstrate my thought process on the API FSM design using the concept of Resource Blueprint. Since I am not familiar with your API's domain I will use the concept of building a House.

If you are interested for more details you can check my slides on the Resource Blueprint

Lets describe the Building resource:

Resource Building

Attributes

Attributes (data) of the resources, this may be really anything...

walls (array[wall]) - array of walls as built
roof (roof) - roof built on top of the walls
paint (string) - name of the wall's color or empty string

Actions / Relations / Affordances

A list of actions you can perform with the resource. What actions are available depends on the current state as is offered by the server. This list here list all actions resource can possibly offer. Client MUST not remember what action is available in what state, as this leads to tight coupling with the server implementation.

buildWalls - builds walls
buildRoof - builds a roof on top of walls
paintWalls - paints the walls
scrapeBuilding - scrape everything

States

The top-level list are names of the states of this resource.

Each state lists actions available in the particular state. For example, you can perform "build walls" action in the "empty property" state but not "paint walls" action.

Every listed action shows to what state you get after exercising the action. So the syntax is:

- original state
    - action → destination state

Now lets look at the states.

emptyProperty (resource entry point)
- self → emptyProperty
Takes you to this state
- buildWalls → wallsStanding
Takes you to the state where walls are standing
wallsStanding
- self → wallsStanding
- buildRoof → wallsAndRoof
- scrapeBuilding → emptyProperty
wallsAndRoof
- self → wallsAndRoof
- paintWalls → buildingFinished
- scrapeBuilding → emptyProperty
buildingFinished
- self → buildingFinished
- scrapeBuilding → emptyProperty

This is obviously the case for synchronous process, when all the construction happens immediately. Now, when you want to go async you can, for example introduce following states:

wallsInProgress
- self → wallsInProgress
- cancel → emptyProperty
roofInProgress
- self → roofInProgress
- cancel → wallsStanding
paintInProgress
- self → paintInProgress
- cancel → wallsAndRoof

and modify the existing state' transitions like so:

emptyProperty (resource entry point)
- self → emptyProperty
- buildWalls → wallsInProgress

etc.

Now whether modeling the construction processes as states of the one resource or separate it under another, for example "construction process" resource, I leave up to you. The point here is how to think about FSM of this problematic.

The beauty of this abstraction is that it leaves the questions of protocols / URLs / Methods / JSONs etc for later, as they are a mere technical details and lets you focus on the way your API is designed.

Frankly, your APIs client should only focus on understanding the resource attributes, actions and its parameters. The absolute values of URLs are somewhat and methods are somewhat irrelevant (tho I suggest to pay attention to them as well).

Not sure if this musings about API design as FSM helps you with your API question, but I would be happy if this is the case!

schatekar commented 8 years ago

@zdne Wow, this is very close to what I was thinking though I could not put all the piece together. I am glad you have a name for this. If I had to summarise "Resource Blueprint", what you are really saying is do not model state transitions as API endpoints. Only model states as API endpoints. Is that correct?

zdne commented 8 years ago

@schatekar I guess yes. Although I do not like to think about endpoints. If REST is "Representational state transfer" then it probably mean that we are transferring a state representation when we are hitting an endpoint. Does it make sense?

In other words, what client is getting when accessing any endpoint is a representation of a resource in certain state. The representation of the state consists of data attributes representation and affordances representations (= available state transitions).

claytondaley commented 6 years ago

I'm working through a similar requirement. I seem to be deeper into the HTTP RFC and REST concepts, but am interested in reactions (especially potential pitfalls) from folks who have already trod this path.

Idempotence

It's important that your calls are idempotent. If all of the steps use the same "process" or "advance" keyword, what happens if you have to retry a call (but the server receives both)? How do you prevent the machine from advancing twice? There are at least two solutions (I prefer the first).

Make the name of state transitions deterministic. "Cancel" should only (edit: not always as some transitions are not permitted) take you to "cancelled". If the FSM is already in the cancelled state, it returns an error code.
Use something like optimistic concurrency (e.g. an ETag or resource version number). I don't think a library would send an ETag to a nested action out of the box so that may militate against this approach.

POST to Actions

PUT has specific semantics and targets a specific resource identified by a URI:

The PUT method requests that the state of the target resource be created or replaced with the state defined by the representation enclosed in the request message payload. A successful PUT of a given representation would suggest that a subsequent GET on that same target resource will result in an equivalent representation being sent in a 200 (OK) response.

This is clearly not what you'd expect if you could actually GET from an action endpoint so it doesn't have the right semantics. POST does.

HATEOAS

@schatekar expresses the concern:

In order to send request on the correct URL, client needs to know the status of entity. So client has to make an extra call to know the status of the entity first.

A true REST interface respects HATEOAS and really should work this way:

get a copy of the object
look at the available affordances (labeled URIs like "process2": "/entity/{id}/action/process2")
call the one you want (if it exists... if not, it's not valid for the current state of the object)
if the affordance is no longer valid (e.g. the state has changed since your GET) you should receive an exception (409 is probably "right", but 400 is more likely in the wild)

The client needs to know what action to use, but it's best if the client treats the URI as a black box. If you changed your server-side schema, you should be able to update the URI for process2 (e.g. to /fsm/{id}/action/process2) and your existing client would continue to function.

Data-Triggered Transitions

If transitions can occur in response to data (rather than user action), exposing the FSM may be unnecessary. Consider a business rule like "the object is approved after 3 users approve it". Instead of creating an "approve" action, create an approval resource and an observer (hopefully generic like Django's signals).

A user "approves" a change (POSTs to create an approval resource)
Observer condition not satisfied
A user "approves" a change (POSTs to create an approval resource)
Observer condition not satisfied
A user "approves" a change (POSTs to create an approval resource)
The observer's conditions are satisfied so it changes the state to approved

There's still an FSM under the covers, but you've refactored the interaction so:

The API is 100% RESTful CRUD (no actions)
FSM state transitions are triggered by the data and not (directly) by the user interaction

Here's the kicker.... the pattern is more "obvious" if three approvals are needed, but it works just as well if only one approval is needed.

In a HATEOAS world, the API may not even expose the raw state. Instead, a resource in the "NeedsApproval" state includes an "approve" afforadance. Once the resource is in the "Approved" state, the affordance goes away.

POTENTIAL PITFALLS:

If the observer fails (DB connection drops, powered down, internal error), the system can end up in an inconsistent state. Some options to mitigate the risk include:
1. run the observer in a transaction with the update (if the observer runs fast enough)
2. add startup scripts to restore database consistency (especially for long-running transitions)

Separate Resources

@zdne mentions separating entity and state in passing and this deserves highlighting. A house is in a lot of states simultaneously. For example, a builder might be both building and selling a house at the same time. If we're building reusable components, we don't want to couple the House data model to the Construction FSM or the Sales FSM. Separating out the logic should provide smaller components that are easier to read and test.

This approach is even more attractive if you can combine it with Data-Triggered Transitions. Now your data models are completely decoupled (but provide generic observers). Your FSMs subscribe to the relevant events (i.e. observers) and maintain their own state based on changes in the target data.

Your presentation layer still needs to be state-aware to display the right affordances. Ideally it's assembled from a lot of standard parts. For example, your final renderer inherits from a generic "house" renderer and extends the model by adding the affordances relevant to the app.

geemus commented 6 years ago

@claytondaley thanks for the detailed writeup.

I agree that more explicit transitions provide better idempotence (and similarly that they should error if you then end up calling one at the wrong time for the FSM). Ditto that POST makes good sense for this use case.

Data triggered transitions and separating resources also make good sense in many cases, but I've also certainly seen some cases where a separate resource seems like it would complicate/confuse things. For instance, with the status of a server, it seems clearer to do something directly to the server for say a reboot, than to create a reboot object which then causes a reboot. Not that a distinct object is impossible or intractable, but I think in some simple cases it can seem confusing. It seems like a judgement call situation, though I tend to prefer to avoid those (as judgement is not equally distributed among individuals). Good food for thought throughout, certainly.

claytondaley commented 6 years ago

@geemus

I'd like to propose (for consideration/debate) that the confusion of separate resources arises mostly from an attempt to map OOP to other domains. Representing object.action() as object/action is intuitive, but imagine the parallel case in DB CRUD. I could probably create a stored procedure called "activate" and write a SQL query that uses it to change an object's state, but I think most would agree that it's anti-pattern. I'm not sure what's different about REST except that most of us have a weaker understanding of how it "should" work.

===

I definitely agree with your general intuition, but I'd like to find a different/better example than power management to really test the logic of the "separate resource" argument. I think power management is an especially weak case because the actual state of the machine is 100% data driven. It's a physical property that can only be determined from data flowing from the machine.

This creates odd cases like:

You call reboot and the request times out. Is there any way to know if it succeeded? Say you poll the system. Every response is state=on. Did the system reboot (and you missed the state=off)? Or did your command fail.
You call shutdown and poll the system. Every response is state=on. Did the shutdown fail? Or did someone else power it back on between your requests?

The best way to model power management is a sequence of requests. You care whether the request was fulfilled. The only way to know this (with certainty) is to expose it as a resource.

Do you disagree? If not, can you think of another strong case where resources feel especially wrong?

Cancelling a request might be a good example. It could get really ugly to have requests (to cancel) nested in requests (to reboot). I'd probably avoid a nested request by modeling the original request as a state machine. In the requested state, you can still cancel. Once it's in process, you can no longer cancel. Then you can just PUT the new state (and it will 409/400 if the request is invalid).

Pathing a Request

I'd probably go with:

POST /machine/{id}/power-task
{
    "state": "off"
}

201 CREATED /machine/{id}/power-task/{id}

I went with task over request for brevity and clarity, but also considered cmd and call.

Pathing a separate FSM

Assume you:

Accept the argument that an FSM should be a separate entity (and)
Can't (for whatever reason) make the FSM 100% data driven

How do you path the resource? If you think of it as a one-to-one relationship, there's some discussion (e.g. here). A one-to-one resource has a lot in common with a property and I've occasionally seen properties exposed as sub-resources (similar to the 3rd item here). I keep coming back to something like:

house/{id}/construction-state

There's no terminal {id} (the only thing that give me pause) because it's a 1-to-1 relationship. You change the state by PUTing the new state. Because the state change is not data-drive, it should be instantaneous (if valid) so you can immediately report a success or an error (409 or 400).

geemus commented 6 years ago

I think some of my struggle with more data-driven things is that it can make side effects less apparent. ie in the example you gave of approvers, how am I to know that it the 3rd one will create a side effect of changing the status of a different object? I suspect at least for some that might be surprising, where it might be less surprising in the action case (where it is more apparent that the thing you are doing relates). Some of that may just be a matter of particular pathings/etc rather than a broader issue.

I agree that the action style doesn't deal with with asynchrony, but I would argue that extends to all of REST. Having callbacks/webhooks or doing things based on events often seems smoother in these cases (though it may not be particularly tenable in all cases).

As for your example about power management, it definitely is true that you don't get great feedback on status, can't cancel, don't know what others are doing, etc. Depending on the use case, though, it may not matter. And if it doesn't matter, I suspect that something like actions is a simpler and easier to understand user experience. Though it is less semantically correct, it also limits the number of different objects the user might need to understand in order to be effective (which has it's own value which can be very subjective).

Which is all to say, I agree that modeling everything is probably more correct, but it may be at the cost of the user experience. By selectively choosing which things warrant the extra gravitas and complexity of full modeling, I suspect we can better balance the demands of correctness and experience (but it is very subjective and case-by-case). Does that make sense?

claytondaley commented 6 years ago

I would frame your argument as the pragmatic case against a (strict) RESTful interface... rather than the case that actions should be prominent (or even present) in a RESTful interface. Based on a quick search for REST and Actions, I think most would agree with this classification (for example, here).

If the OP (or folks in similar situations) need to manage a state machine over an HTTP API (the repo's general purpose), your arguments should probably carry considerable weight.

If the OP (or similar) actually wants to implement a state machine over a RESTful interface (the title of this issue), I don't think actions are an option.

I'm trying to explore this second path to discover worst case scenarios. Besides usability concerns, we haven't identified any major issues that are unique to REST:

Having to model the state machine as an endpoint (e.g. object/{id}/fsm, supporting PUT or PATCH of the state value) if the state changes are synchronous or guaranteed
Having to model the state transition as a task/request if the state changes are async (and not guaranteed) (edit: applies to HTTP API too)
@Dealing with states that are not directly managed by the API (e.g. the power state of a remote server). {cancel} is difficult to model due to a race condition so the request probably need a more intricate internal state (e.g. {cancel requested}) (edit: revised identification of problem based on discussion below, would apply to HTTP API too)

If anyone else has run into specific cases where state transition cannot be data-driven and are especially difficult to model as a resource, I'd be interested in hearing about them.

geemus commented 6 years ago

That is not an unreasonable framing. I certainly don't think actions need to be prominent, though there are some cases where something like that (or full representation) does seem necessary.

I'm afraid I don't know what OP means in this context? Could you fill me in?

You distinction between managing and implementing is subtle, but I think important and likely correct. Thanks for talking through this and working toward that clarity.

I think a distinct endpoint helps in making things explicit (and giving a target for checking status or cancelling), that changing in place don't really provide. So this seems like a good approach.

I suspect the difficulties around cancel have more to do with the specific use case than anything else. ie cancel might be a valid transition from any given state (though this seems dangerous). If you had a history of transitions, maybe something like undo would be easier to think about or model? Though I suppose that presumes the transition has finished, rather than still being in progress. The in progress part is tricky, as cancel has a race condition against completion (potentially) and also it may be more complicated to undo partial transitions. Do you have a particular use case that needs cancels that you could explain in more detail? I think that might help me to discuss it in more detail. Thanks!

claytondaley commented 6 years ago

OP is a Stack Exchange (e.g. Stack Overflow) acronym for "Original Poster".

I suspect the difficulties around cancel have more to do with the specific use case

Great point. This case is hard because the state is not determined by the API (or its database) but by a 3rd party who's sending updates to the API. That's what introduces race conditions. Thus it may be no less complicated over a non-REST API.

geemus commented 6 years ago

Ah ha, thanks for that clarification.

The 3rd party nature does sound like it makes this a lot harder (regardless of the interface you provide). A too-many-cooks-in-the-kitchen kind of problem.

claytondaley commented 6 years ago

OK. Here's a real case where a clean, RESTful implementation is causing me headaches:

We have tasks. Tasks are self-contained resources -- each contains a JSON "questions" field and a JSON "answers" field. Each entry in the questions field (i.e. list) may be optional or required. The task is completed when the answers field includes answers for all of the required questions (but you can easily imagine a more complicated version with multi-answer validation rules).

The state transition rules are easy so the FSM itself can be data-driven. But what is the most RESTful way for a client to determine if their answers fulfill the completion criteria (and/or why not)? Here's what I've considered so far:

We could validate all PUTs against completion criteria, but this seems wrong. All we've done is hijack PUT and make it a poorly-labeled "complete" action. It's perfectly reasonable to save partially completed tasks and this would prevent it.
We could add a validation resource. Unless you reintroduce an action, I haven't found semantics that aren't terrible.
We could POST (to /tasks/{id}) an envelope that identifies the request as a "validate" request. This is RESTful as POST allows very generic usage, but looks awkward and non-obvious to me.
We could allow the client to include complete = True (or state = complete) in a PUT/PATCH request. This flag would not actually change the state in the database, but would guide validation.
- If a "completion" request is invalid, you get back the reason
- If a "completion" request is valid, answers is updated, the observer runs (ideally in a transaction), and state is (indirectly) updated.

From a RESTful perspective, I'm most comfortable with (4). From the caller's perspective, it follows the PUT semantics because we would expect a GET (following a success) to return a task with the same completion flag/state. One time (edit: see next comment) this might not happen is if the observer is async. In this case, the behavior is more like "eventual consistency" with PUT semantics.

The most concise way to ask "why isn't this task complete" would be:

PATCH /tasks/{id}
{
    "completed": true
}

The more explicit would be a full PUT so the client knows exactly what is being validated.

How does (4) sit with you? Can you think of a more intuitive way that is RESTful?

claytondaley commented 6 years ago

... another case where the PUT is not perfect (besides async) is when you put completed=false with a complete answer because the observer will change it. However, this is allowed under the RFC:

there is no guarantee that such a state change will be observable, since the target resource might be acted upon by other user agents in parallel, or might be subject to dynamic processing by the origin server, before any subsequent GET is received.

geemus commented 6 years ago

Sorry for my response delay (travel last week and still playing catchup).

The PUT could perhaps use either 204 to indicate that it is not completed or 200 to indicate that it did in fact validate. I would hesitate to have the output body vary between these two, but perhaps the return would always show something about what is valid or not (and the 200 case would just show them all valid). Alternatively, the PUT could simply return this value and a separate GET could be used to ask for why (though the extra hop is undesirable, it may be semantically clearer). What do you think?