slipstream / SlipStreamClient

SlipStream Python client
Apache License 2.0
1 stars 4 forks source link

Node executor working in tandem with a caching proxy should act more resiliently to the service unavailabilities in Ready state #219

Open konstan opened 8 years ago

konstan commented 8 years ago

Here is the run aborted in Ready state due to 502.

ss:abort- Exception with detail: ('Failed calling method GET on url https://nuv.la/run/f81e00ff-a18a-43b4-afa3-4c67c9e80a47/ss:state?ignoreabort=true, with reason: 502: Bad Gateway',)

This can be a problem in case of long-running auto-scalable runs.

Solutions might be:

schaubl commented 8 years ago

This can be a problem in case of long-running auto-scalable runs.

For mutable(scalable) run there is no limit on the number of the retry the node executor will made so it will not be a problem.

return another status code (e.g. 503 with Retry-After header) for ss:state RTP resource, so that node executor can act more wisely in case of Ready state.

Currently the Server doesn't tell to the client how many time to wait but instead the client implement an "exponential backoff" algorithm. Retry-After doesn't make sense in the case of a server issue (like a 502) because if the server is crashed it cannot determine how many time the client has to wait before retry and nginx cannot either because it doesn't know when the server will come back.

To summarize: