micronaut-projects / micronaut-core

Micronaut Application Framework
http://micronaut.io
Apache License 2.0
6.01k stars 1.05k forks source link

Frequent random 408 request timouts #7615

Open deepsandeepme opened 2 years ago

deepsandeepme commented 2 years ago

Expected Behavior

Expectation is to get the response from a dependent API call successfully without getting random frequent 408 request timeout.

Unfortunately, after enormous failed attempts, I tried the traditional way and used io.micronaut.http.client.HttpClient to create the client manually and hit the dependent service. It resolved the issue.

I feel there is something with the configuration itself that is causing these timeouts.

Actual Behaviour

I am trying to connect to external service using micronaut.security.auth2.clients. However, I am seeing frequent "408 request timeout" error. Initially, I though there might be a configuration issue. So, I went through the micronaut documentation and tried to set the micronaut.http.client.read-timeout: 30s. But, that did not work too. I still saw the request timeout.

Steps To Reproduce

In Any environment,

  1. Set up a security.auth2.client with token and auth method
    micronaut:
    caches:
    devcaps:
      charset: 'UTF-8'
      expire-after-write: 60s
      expire-after-access: 60s
      maximum-size: 5
    security:
    oauth2:
      clients:
        auth0:
          client-id: ${app.client-id}
          client-secret: ${app.client-secret}
          grant-type: client_credentials
          client-credentials:
            service-id-regex: 'devcaps'
            scope: openid groups audience:server:client_id:${device-capabilities.client-id}
          token:
            url: ${app.token-gateway-url}
            auth-method: client_secret_post
  2. Use @Client to configre the client and call the API.
  3. Run the application in the kubernetes cluster
  4. The integration works just fine when running in local machine

Environment Information

Example Application

No response

Version

3.5.2

yawkat commented 2 years ago

so this requires kubernetes to reproduce?

deepsandeepme commented 2 years ago

Yes. We tested it in local machine and we did not see any issues there. The moment we deployed it in the cluster, we started observing this errors. It is not happening with one specific client integration. This behavior is consistent with any client integration. We have tested it with 3 clients so far.

The problem goes away as soon as we switch to using the HttpClient and writing the code manually to hit the API like below:

HttpClient httpClient;
    try {
      httpClient = HttpClient.create(new URL(client.getServiceUrl()));
    } catch (MalformedURLException e) {
      throw new ApplicationException("Malformed URL Exception", e);
    }

    DeviceResponseDTO deviceResponseDTO =
        httpClient
            .toBlocking()
            .retrieve(
                HttpRequest.GET("/v2/users" + vin)
                    .bearerAuth(oAuthTokenGeneratorService.getToken(client))
                    .accept(MediaType.APPLICATION_JSON)
                    .contentType(MediaType.APPLICATION_JSON),
                DeviceResponseDTO.class);
graemerocher commented 2 years ago

You are probably blocking the event loop and it surfaces because your pods only have 1 CPU (locally you have more CPUs). If you are going to block the event loop you need to configure the client to use a separate thread pool. See https://docs.micronaut.io/latest/guide/#clientConfiguration and the snippet "Altering the Event Loop Group used by Clients" so that a separate event loop is used for the client and the server which would prevent the deadlock if you plan to use blocking logic at the cost of thread context switching.

deepsandeepme commented 2 years ago

@graemerocher Thank you. It worked for me. However, could you please let me know if creating a separate event-loop is a right approach?

I have only 1 client call with 1 oAuth call in this flow. How to approach this to avoid blocking the event loop?

I see you mentioned about thread context switching. Can it be used in my scenario? Would you be able to point me to any documentation/code samples available?