okta / okta-sdk-java

A Java SDK for interacting with the Okta management API, enabling server-side code to manage Okta users, groups, applications, and more.
Apache License 2.0
147 stars 135 forks source link

Consistently thrown NoHttpResponseException's in 13.0.3 #1158

Open anbangz opened 6 months ago

anbangz commented 6 months ago

Describe the bug?

We are seeing consistently thrown NoHttpResponseExceptions (and on the com.okta.sdk.resource.api.UserApi.createUser call in particular) after upgrading our Java SDK from 8.2.1 to 13.0.3. This appears to be a reoccurrence of the same underlying issue described in https://github.com/okta/okta-sdk-java/issues/24 and resolved in https://github.com/okta/okta-sdk-java/pull/23

org.apache.hc.core5.http.NoHttpResponseException: auth.mongodb.com:443 failed to respond
    at org.apache.hc.core5.http.impl.io.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:301)
    at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:175)
    at org.apache.hc.core5.http.impl.io.HttpRequestExecutor.execute(HttpRequestExecutor.java:218)
    at org.apache.hc.client5.http.impl.io.PoolingHttpClientConnectionManager$InternalConnectionEndpoint.execute(PoolingHttpClientConnectionManager.java:712)
    at org.apache.hc.client5.http.impl.classic.InternalExecRuntime.execute(InternalExecRuntime.java:216)
    at org.apache.hc.client5.http.impl.classic.MainClientExec.execute(MainClientExec.java:116)
    at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
    at org.apache.hc.client5.http.impl.classic.ConnectExec.execute(ConnectExec.java:188)
    at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
    at org.apache.hc.client5.http.impl.classic.ProtocolExec.execute(ProtocolExec.java:192)
    at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
    at org.apache.hc.client5.http.impl.classic.HttpRequestRetryExec.execute(HttpRequestRetryExec.java:96)
    at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
    at org.apache.hc.client5.http.impl.classic.ContentCompressionExec.execute(ContentCompressionExec.java:152)
    at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
    at org.apache.hc.client5.http.impl.classic.RedirectExec.execute(RedirectExec.java:115)
    at org.apache.hc.client5.http.impl.classic.ExecChainElement.execute(ExecChainElement.java:51)
    at org.apache.hc.client5.http.impl.classic.InternalHttpClient.doExecute(InternalHttpClient.java:170)
    at org.apache.hc.client5.http.impl.classic.CloseableHttpClient.execute(CloseableHttpClient.java:106)
    at com.okta.sdk.resource.client.ApiClient.invokeAPI(ApiClient.java:1108)
    ... 92 common frames omitted
Wrapped by: com.okta.sdk.resource.client.ApiException: org.apache.hc.core5.http.NoHttpResponseException: auth.mongodb.com:443 failed to respond
    at com.okta.sdk.resource.client.ApiClient.invokeAPI(ApiClient.java:1116)
    at com.okta.sdk.resource.api.UserApi.createUser(UserApi.java:394)
    at com.okta.sdk.resource.api.UserApi.createUser(UserApi.java:339)
    at com.okta.sdk.impl.resource.DefaultUserBuilder.buildAndCreate(DefaultUserBuilder.java:469)

What is expected to happen?

Okta SDK v13.0.3 issues calls to the Okta API without the caller needing to be cognizant about connection management

What is the actual behavior?

Okta SDK outbound calls intermittently, but consistently, results in a NoHttpResponseException's being thrown at com.okta.sdk.resource.client.ApiClient.invokeAPI(ApiClient.java:1116)

Reproduction Steps?

Consistently issue outbound calls to the Okta API via Java SDK v13.0.3

Additional Information?

No response

Java Version

openjdk 17.0.10

SDK Version

13.0.3

OS version

No response

arvindkrishnakumar-okta commented 6 months ago

Thanks for the post!

The stack trace (auth.mongodb.com:443 failed to respond) suggests that there's an underlying connection issue and the SDK is expected to throw such errors under such cases.

literallyjustroy commented 3 months ago

I'm not sure how long we've been seeing this, but we just noticed it as well. Occasionally a random POST call will fail with this exception.

It looks like this was a known issue, and it was resolved here via retry for any idempotent request: https://github.com/okta/okta-sdk-java/pull/23#discussion_r60785344

Also, I dont think we have to use the connectionManager.closeExpiredConnections() or connectionManager.closeIdleConnections(TTL, TimeUnit.SECONDS), retrying requests serves the purpose

But for POST calls (like creating a user) this doesn't appear to be handled since the retry won't happen. https://github.com/okta/okta-sdk-java/issues/24#issuecomment-322894219

You're correct that StandardHttpRequestRetryHandler doesn't cover that case, so the current fix won't work for the AuthApiClient (or for any methods that require POST).

Is it possible to look back into this and consider a connection eviction policy? https://stackoverflow.com/questions/10558791/apache-httpclient-interim-error-nohttpresponseexception

As an additional note we are seeing this behavior with a custom domain for Okta as well. I am unable to produce locally with loads of bogus create requests, but we see a couple of these each day in production (using Okta Java SDK 16).

literallyjustroy commented 3 months ago

@arvindkrishnakumar-okta Would it be possible to reopen this ticket given it appears to be a known issue which was partially fixed in 2016?

arvindkrishnakumar-okta commented 1 month ago

Can you try experimenting (increasing) with the connection timeout values (Ref: https://github.com/okta/okta-sdk-java?tab=readme-ov-file#environment-variables)?

Are you able to do a curl to that endpoint and get a response without any unusual delay?

literallyjustroy commented 1 month ago

@arvindkrishnakumar-okta We are currently setting the connection timeout via the DefaultClientBuilder setConnectionTimeout method to 60 seconds. This was done a few months ago to prevent long-running connections (we were seeing okta connections occasionally hang for excessive lengths of time holding up sql transactions).

Are you thinking this value is too low? Given that we only see this issue in production it's hard to experiment with different values here.

I can curl the same endpoints in a non-production tenant and do not see any unusual delay.

arvindkrishnakumar-okta commented 1 month ago

@literallyjustroy Though 60s is a reasonable timeout value, I can suggest you increase it to say 90s and observe. Since you say it is not reproducible in non prod environments, it is a bit tricky. Would you be able to capture network level trace/detailed logs from PROD when this issue happens? On a side note, you could try upgrading to latest version of SDK 18.0.0.

literallyjustroy commented 1 week ago

(Waiting on bug-fixes/next release before upgrading to latest SDK)