roomorama / concierge

Roomorama supplier integrations - webhook providers and property synchronisation
https://concierge-web.roomorama.com
3 stars 0 forks source link

External API timeout #567

Closed keang closed 7 years ago

keang commented 7 years ago

Context

Supplier apis may time out after 10s (our default timeout duration). While we increase timeout for some suppliers during sync, we cannot increase it during quote, book, cancel. Or can we?

Problem

The requirement to give a quick response to user/distributor is Roomorama's, or the "front end". Concierge should continue to wait, even for 'real time' apis. (especially booking, because we would get the supplier booking reference for easier debugging/communicating with supplier).

Roomorama's distributor booking api has a workaround of "if something goes wrong at Supplier's side, just act like it was success, and customer service to inquire about it later".

This "feature" is implemented at Roomorama's side and should be independent of Concierge's internal timeout.

Proposal

keang commented 7 years ago

@kkolotyuk @rmascarenhas @sharipov-ru what do you think?

rmascarenhas commented 7 years ago

Roomorama currently waits 15s for the booking to complete. Do we have many cases where the booking would have completed if only we waited for 30s? I find that a bit of an unlikely case.

That said, we could experiment with the increase for the booking operation only. Every HTTP call should have timeout, otherwise the whole service can go down due to one buggy supplier. For more info on a similar situation that happened on Roomorama, see https://github.com/roomorama/roomorama/pull/3203.

keang commented 7 years ago

I only have evidence for 1 case where the booking went through after 10s. Since the timeout kicks in we don't know how long it actually took.

Good point:

Every HTTP call should have timeout, otherwise the whole service can go down due to one buggy supplier

Yeah I agree we can start with just /booking, setting a generous timeout.

sharipov-ru commented 7 years ago

My two cents while this issue is still opened 😄

Every HTTP call should have timeout, otherwise the whole service can go down due to one buggy supplier

As we grow, me may consider separating supplier workers on different servers, so that buggy supplier affects only itself.

@keang

Still create an external error(of code slow_response) to keep track of these slow responses on Concierge.

with having timeout increased to 60s for THH, do we need to track slow responses as you suggested in the first message?

Maybe slow response tracking should not be a part of concierge and should be tracked on the more highest level? I remember there were attached graphs with response times in kigo discussions, could guys share some information how you access / generate such kind of data, do you have configured alerts for slow responses and so on. Would be helpful to understand it!

keang commented 7 years ago

Yes now that we continue to put a hard timeout, we don't need to make a special external error just to track slow responses.

Ah yes, the graph of response times is done by parsing the logs of roomorama api calls. It just takes the previous weeks(?) logs and parse and tally the response time for each supplier. We don't have any alerts for this.. yet.

I agree though looking at this now, it can be the responsibility of something else at a higher level..

keang commented 7 years ago

If no one feels strongly about having this feature, i'll close this for now