spring-cloud / spring-cloud-netflix

Integration with Netflix OSS components
http://cloud.spring.io/spring-cloud-netflix/
Apache License 2.0
4.87k stars 2.44k forks source link

Agent got "UT000128" until Ribbon.ReadTimeout #2285

Closed lhchingit closed 7 years ago

lhchingit commented 7 years ago

Hi, We have build an environment for microservices with the help of Zuul, Ribbon and Hystrix. Most of the time, it works normally without any exception. But sometimes, we encountered a situation that our service did received the request and well-handled in the web filter. However, while 'chian.doFilter' has been called, the REST interface didn't do anything and we got UT000128 after time-out (Ribbon.ReadTimeout). (we use Untertow for our agents) It seems that our RESTful service didn't get the request body so that it is just waiting. Does anyone see this exception before? Thanks a lot in advance.

The error message is shown as follows: [2017/09/20 12:17:24.476][ WARN][XNIO-2 task-5][o.s.w.s.m.s.DefaultHandlerExceptionResolver.handleHttpMessageNotReadable@384][22102eac-579c-44fe-9ec2-cab469361804] |@| Failed to read HTTP message: org.springframework.http.converter.HttpMessageNotReadableException: Could not read document: UT000128: Remote peer closed connection before all data could be read; nested exception is java.io.IOException: UT000128: Remote peer closed connection before all data could be read

ryanjbaxter commented 7 years ago

Why do you think that it is the Ribbon read timeout? Sounds like you already know why the problem is occurring.

lhchingit commented 7 years ago

Hi, thanks for your reply, Because after the Ribbon.ReadTimeout pasted, the web filter didn't receive anything but got exception and entered the finally block. And the request seems stuck after completing doFilter without entering the corresponding rest controller.

Part of my web filter is like this: WebFilter.class

public void doFilter(ServletRequest req, ServletResponse resp, FilterChain chain)
            throws IOException, ServletException {
    try {
        ...
        logger.info("Start doFilter.");
        chain.doFilter(req, resp);
    } catch (Exception e) {
        ...
    } finally {
        ...
        logger.info("End doFilter.");
    }
} 

RestService.class

@RequestMapping(path = "/do-something", method = RequestMethod.POST)
@ResponseBody
purlic String doSomething(@RequestBody String body) {
    logger.info("Entering doSomething...");
    ...
}

If the value of Ribbon.ReadTimeout was 1200000 (20 minutes), then the log will print following message:

[2017/09/20 16:08:09.733][ INFO][XNIO-2 task-3][c.t.s.a.a.AuthenticateFilter.doFilter@112][f0650d76-a509-4b35-a3fd-b88d098215b2] |@| Start doFilter.
[2017/09/20 16:28:09.834][ WARN][XNIO-2 task-3][o.s.w.s.m.s.DefaultHandlerExceptionResolver.handleHttpMessageNotReadable@384][f0650d76-a509-4b35-a3fd-b88d098215b2] |@| Failed to read HTTP message: org.springframework.http.converter.HttpMessageNotReadableException: Could not read document: UT000128: Remote peer closed connection before all data could be read; nested exception is java.io.IOException: UT000128: Remote peer closed connection before all data could be read
[2017/09/20 16:28:09.835][ INFO][XNIO-2 task-3][c.t.s.a.a.AuthenticateFilter.doFilter@148][f0650d76-a509-4b35-a3fd-b88d098215b2] |@| End doFilter.
ryanjbaxter commented 7 years ago

So Ribbon is just timing out before the client has finished reading the content of the request?

lhchingit commented 7 years ago

@ryanjbaxter YES! It seems the client has got the request header (the header has been checked in the web filter) but still wait for the request body until time out. This situation will appear several times in about 10,000 requests, most of them works perfectly.

ryanjbaxter commented 7 years ago

I am not sure what we can do to help with this unless you can narrow it down to a reproducible scenario.

lhchingit commented 7 years ago

Hi, We found a scenario which can be easily reproduced with following setting but the error message isn't UT000128. ribbon.ReadTimeout = 3000 ribbon.MaxAutoRetries = 0 ribbon.MaxAutoRetriesNextServer = 1 And we force the service to sleep 5000 (ms) to trigger the retry mechanism.

@RequestMapping(path = "/do-something", method = RequestMethod.POST)
@ResponseBody
purlic String doSomething(@RequestBody String body) {
    logger.info("Entering doSomething...");
    ...
    Thread.sleep(5000);
    ...
}

However, it leads to similar condition as UT000128 - the request body cannot be read. One of the agent does sleeps 5000 (ms) and digests the request body. After retrying to the other agent, we got an exception due to the request body has been read. It seems that the retry mechanism for Zuul won't reserve the original request, so for POST or other HTTP methods, the request body is going to be empty after retry.

ryanjbaxter commented 7 years ago

Is this the problem you are running into https://github.com/spring-cloud/spring-cloud-netflix/pull/2209?

andrehueppe commented 7 years ago

@ryanjbaxter, @lhchingit Exactly. Once the body was processed for the initial request, it is gone for any retry. #2209 fixes this.

lhchingit commented 7 years ago

@ryanjbaxter Yes, that what I am exactly running into. @doernbrackandre Thanks you for resolving this issue! And the only way I can think is to use Wrapper to generate a copy of request body before dispatching to any service, too.