Closed vimal-raz closed 6 years ago
It is deployed in Pivotal Cloud Foundry
Please learn how to properly format code and logs. What version of spring cloud are you using? See #1334
updated the formatting,. I m using the spring-cloud-services.version>1.5.0.RELEASE. Other versions details are available in ticket.
I found a related ticket https://github.com/spring-cloud/spring-cloud-netflix/issues/2079
does it seems like cloud foundry issue?
I think this is the same as #2079
I have opened a ticket with cloud foundry team. Will keep this ticket updated. thanks
@vimal-raz Did you get any updates from the cloud foundry team? We are experiencing the exact same issue - also on CF.
@dersteve , Our infra team is working closely with PCF support to find the solution. Here are findings so far: The reason for the 500 error is the NAT configuration in AWS. The AWS NAT Gateway is set to disconnect idle connections after 5 minutes.
@dersteve PCF team suggested to use Spring retry in boot apps and zuul to handle this issue.
https://github.com/spring-projects/spring-retry https://docs.spring.io/spring-batch/trunk/reference/html/retry.html
Hello,
Im encountring the same issue with "Connection Reset" exception, any update please ? Is it related to zuul ?
Thanks
We have been investigating this issue on our CloudFoundry architecture a bit more and it seems to be an issue with the Http client and the way connections are kept-alive. The issue appears when using either Apache HTTP Client (default) or OkHttp. It does not appear however, when using the deprecated restclient which was previously used as the default Http client. So this is the current fix for us (please note that the client is officially deprecated and according to some of the developers it was deprecated due to some bugs).
The following short paragraph talks about the three clients https://cloud.spring.io/spring-cloud-netflix/multi/multi__router_and_filter_zuul.html#_zuul_http_client
Another fix that could solve (cover-up) the issue would be to provide a custom Http client which disables connection keep-alive.
@spencergibb mentions that:
RestClient has limitations like not supporting PATCH and other bugs that are fixed with Apache
here: https://github.com/spring-cloud/spring-cloud-netflix/issues/1125
Is there something going on about this bug, since we have the same problems...
@tomaszglinski just out of curiosity, did you try to make use of the RestClient and did it fix the issue for you?
The RestClient installs a cleaner for the http connection pool. This cleaner closes connections and removes pool entries which are older than 30 seconds so they will be cleanly closed in time and never become a victim of the AWS 5 minute disconnect of idle connections rule.
The exception we are seeing is the following
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:282)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient$1.doWithRetry(RetryableRibbonLoadBalancingHttpClient.java:94)
at org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient$1.doWithRetry(RetryableRibbonLoadBalancingHttpClient.java:72)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:287)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:164)
at org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient.executeWithRetry(RetryableRibbonLoadBalancingHttpClient.java:107)
at org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient.execute(RetryableRibbonLoadBalancingHttpClient.java:72)
at org.springframework.cloud.netflix.ribbon.apache.RetryableRibbonLoadBalancingHttpClient.execute(RetryableRibbonLoadBalancingHttpClient.java:52)
at com.netflix.client.AbstractLoadBalancerAwareClient$1.call(AbstractLoadBalancerAwareClient.java:109)
Spring retry is been used but in this case I think it has no real affect as
The entry does not seem to be removed from the pool when exception is thrown.
If another server gets retried it will also get the same error
There use to be an option to check the validity of the connection every time but this no longer works despite the property still existing. It was deemed to be not performant. The PoolingHttpClientConnectionManager does not seem to be set up for the client thats created in RetryableRibbonLoadBalancingHttpClient which means we cannot use the ValidateAfterInactivity property.
This is closed but there's no fix?
I encountered the similar error. Code is deployed in Cloud Foundry. I am using zuul routes.
It was closed as a duplicate of #2079
If we run the Zuul proxy for about 15-30 min without making any calls, it will fail the first call with an HTTP 500 error (Connection reset when executed on server )zuu. After that, all subsequent calls work properly.
Config:
Versions:
Full log: First call fails and second works https://gist.github.com/vimal-raz/9ddc8113e7513b5ab54d2533b1cad0cb