Closed jzyinq closed 6 years ago
@italolelis could you refer to this ?
Hi @jzyinq,
Thanks for reporting this. At the moment we don't have this configuration per proxy. But we have a task to add a plugin that you can configure the timeout for each endpoint. We should have this functionality soon.
Thanks for reply @italolelis,
Sorry for bothering but how soon is soon ? ;)
Could you guide me then how to set this globally for now ?
@jzyinq in fact we are going to add this plugin to actually decrease the current time from 60s to 30s at most, in very exceptional cases a request should take that much, if the downstream service does not respond quickly the proxy should just kill the connection. What's your use case?
@rafaeljesus We got some requests that streams very big payloads of data and that takes about 60-120 seconds at the moment. So in our case response is quick but reading it takes more time and it looks like Janus is waiting for whole payload before letting it outside.
Do you have any thoughts how we can tackle this problem ?
@jzyinq I see. Janus should have the capabilities for you to specify a timeout per endpoint. I'll try to implement this asap.
Great! Looking forward to hear about this feature soon. Thanks for your insight @italolelis.
@italolelis is there any eta? We have a pending feature that is blocked because of this issue and before the next product release we have to decide whether we go with Janus or Kong. That's the only blocker for us currently. Thanks! :)
@quba I will open a PR tonight with this feature. There is one integration test failing that I hope to fix it today or tomorrow.
Wow, that's fast. 👍
@quba The new version with the timeouts is out, please check if it attends your needs
Great! We should be able to provide some feedback within next 2 weeks.
Hi @italolelis
It looks like unmarshaler has some problems with the way you formatted the settings, after setting {"forwarding_timeouts": {"dial_timeout": "30s", "response_header_timeout": "30s"},}
, I receive the following error in the console:
[RPC] --> Couldn't unmarshal api configuration error="json: cannot unmarshal string into Go struct field ForwardingTimeouts.dial_timeout of type time.Duration"
After changing those values to integers, janus starts, but responds to all requests with 504, saying
http: proxy error: dial tcp: i/o timeout
In the logs
@glothriel thx for reporting! I'll take a look at this issue
@glothriel issue should be fixed in https://github.com/hellofresh/janus/pull/327, please check - integer value (actually int64) are nanoseconds, so 30ns is too few to process any request. I fixed unmarshalling from string for time.Duration
@vgarvardt Thanks for quick fix and sorry for the delay.
My configuration looks like this:
...
"proxy": {
"forwarding_timeouts": {
"dial_timeout": "30s",
"response_header_timeout": "30s"
},
}
...
Unless I made some configuration error, the feature does not work. Janus starts, but the above setting does not make any effect on the application. Despite having set 30s timeout, it allows much longer requests (the longest I did was 3 minutes, I did not test longer ones). It looks like some global timeout was also removed, because even after removing whole forwarding_timeouts
from the configuration, I'm able to perform those long 3-minute requests. The version we previously used is 3.7.0
(we got timeouts after 50s there), maybe something changed with global timeout in the meantime.
@vgarvardt Can you respond to my previous comment please?
@glothriel I'll try to check this and fix during the next week
@glothriel do you use file or mongo-based proxies storage? During investigation I found an issue with Duration unmarshalling from BSON - fixed in https://github.com/hellofresh/janus/pull/334.
We use file storage.
I created simple endpoint
r.GET("/sleep", func(w http.ResponseWriter, r *http.Request) {
seconds, _ := strconv.Atoi(r.URL.Query().Get("seconds"))
log.WithField("seconds", seconds).Info("Starting sleep handler")
time.Sleep(time.Duration(seconds) * time.Second)
w.Write([]byte("Hello sleep\n"))
})
and set up proxy to it with the following parameters:
{
"proxy" : {
"listen_path": "/sleep",
"forwarding_timeouts" : {
"dial_timeout" : "5s",
"response_header_timeout" : "17s"
},
},
}
then called the endpoint with curl (janus runs on port 3000 in my local dev env)
$ time curl -v localhost:3000/sleep?seconds=30
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> GET /sleep?seconds=30 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 502 Bad Gateway
< Vary: Origin
< X-Request-Id: 68679848-4399-492d-84e4-67205ccb11af
< Date: Fri, 13 Jul 2018 13:31:36 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact
real 0m17.028s
user 0m0.008s
sys 0m0.006s
In janus logs I got the following:
{"level":"debug","method":"GET","msg":"Started request","path":"/sleep","time":"2018-07-13T15:30:41+02:00"}
2018/07/13 15:30:58 http: proxy error: net/http: timeout awaiting response headers
{"code":502,"duration":17007,"duration-fmt":"17.007905357s","host":"localhost:3000","level":"info","method":"GET","msg":"Completed handling request","referer":"","remote-addr":"[::1]:64679","request":"/sleep?seconds=30","request-id":"7a5c8dee-931d-4d99-8f9d-dd71769c4d0c","time":"2018-07-13T15:30:58+02:00","user-agent":"curl/7.54.0"}
That means transport broke connection, so client request was finished in 17s and the response code was 502, although.
Then I increased timeouts to the following values:
"forwarding_timeouts" : {
"dial_timeout" : "5s",
"response_header_timeout" : "180s"
}
curl:
$ time curl -v localhost:3000/sleep?seconds=300
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
> GET /sleep?seconds=300 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.54.0
> Accept: */*
>
< HTTP/1.1 502 Bad Gateway
< Vary: Origin
< X-Request-Id: 2296f1ea-6103-43ff-bb1f-43c14be42dfd
< Date: Fri, 13 Jul 2018 13:47:53 GMT
< Content-Length: 0
<
* Connection #0 to host localhost left intact
real 3m0.028s
user 0m0.011s
sys 0m0.015s
logs:
{"level":"debug","method":"GET","msg":"Started request","path":"/sleep","time":"2018-07-13T15:44:53+02:00"}
2018/07/13 15:47:53 http: proxy error: net/http: timeout awaiting response headers
{"code":502,"duration":180006,"duration-fmt":"3m0.006664055s","host":"localhost:3000","level":"info","method":"GET","msg":"Completed handling request","referer":"","remote-addr":"[::1]:64743","request":"/sleep?seconds=300","request-id":"2296f1ea-6103-43ff-bb1f-43c14be42dfd","time":"2018-07-13T15:47:53+02:00","user-agent":"curl/7.54.0"}
So, timeout feature works for these test cases. Can you provide more information for your test case so I could reproduce it to find the root cause for the issue you're having and fix it?
Thanks @vgarvardt, It appears that behavior I was facing was related to my testing endpoint. We still have an issue with randomly encountered NullPointerDereference
exceptions from cobra on startup, but this will be followed by @jzyinq.
I'm closing the issue then. Feel free to open new one if you find anything else. Thank you!
Is there a way to increase request ttl ?
Something like
upstream_[connect|send|read]_timeout
?Reproduction Steps:
Expected behavior:
Requests is handled properly in his own time.
Observed behavior:
Request is terminated with Janus error:
http: proxy error: context canceled
Janus version: 3.7.0-rc.14 Docker image