smithy-lang / smithy-rs

Code generation for the AWS SDK for Rust, as well as server and generic smithy client generation.
Apache License 2.0
490 stars 187 forks source link

Support configuring credential timeouts separately from client timeouts #2086

Open jdisanti opened 1 year ago

jdisanti commented 1 year ago

See https://github.com/awslabs/aws-sdk-rust/discussions/681 for the use-case. It should be more obvious how to configure credential timeouts separately from client timeouts.

Oliboy50 commented 1 year ago

Problem 😢

To sum up, the biggest issue here is that setting operation_timeout (or operation_attempt_timeout) on a DynamoDB client also acts as a timeout for the credentials loading part of the SDK (i.e. the lazy_load_credentials span).

let sdk_config = aws_config::load_from_env().await;

let dynamodb_client = aws_sdk_dynamodb::Client::from_conf(
    aws_sdk_dynamodb::config::Builder::from(&sdk_config)
        .retry_config(aws_config::retry::RetryConfig::disabled())
        .timeout_config(
            aws_config::timeout::TimeoutConfig::builder()
                .operation_timeout(Duration::from_millis(50)) // <-- 50ms is enough for a DynamoDB query but not for the "lazy_load_credentials" span
                .build(),
        )
        .build(),
);

Random idea to fix this issue and improve the SDK 💡

Something that would be awesome for both the developers and the users of the application built using the AWS SDK, would be to load credentials in another thread (not the main one) every ~15min (i.e. the DEFAULT_CREDENTIAL_EXPIRATION value). So that developers, and main thread users, can both take advantage of operation_timeout and operation_attempt_timeout without being affected by credentials loading process. 🎉 It may lead to issues related to multithreading, such as deadlocks and stuff, but it seems possible to achieve IMO. 🤞

Oliboy50 commented 10 months ago

👋 @jdisanti Does this discussion about a new SDK feature resolve this issue?

Or is the operation_timeout still wrapping the identity cache loading process?

jdisanti commented 10 months ago

The operation timeout still wraps identity resolution in the latest release, and given the architecture, I don't think we're going to be able to separate this out. I'm thinking what we need to do to solve your problem is add an eager credentials cache that uses a background task to continually refresh credentials before they expire so that that refresh happens in a separate thread, and thus, doesn't impact your requests at all.

Unfortunately, we won't be able to get to this quickly, but you can unblock yourself by providing a custom caching implementation via the ResolveCachedIdentity trait, and passing your implementation into the identity_cache() config method. The existing lazy cache can be used as an example.

Oliboy50 commented 10 months ago

Thank you very much for this clear answer :)

Now what about connect_timeout and timeout functions instead of operation_timeout? Do they also impact the identity cache loading process or could we set them to a very low value (such as 5 ms) and still be able to load AWS credentials without timeout issues?

jdisanti commented 10 months ago

If I understand your problem correctly, I think you should be able to set both the connect and read timeouts without any issue. The identity resolution gets included in the operation timeout and operation attempt timeout, but not in the connect/read timeouts.

I think that if you set the connect/read timeouts as part of the SdkConfig, then they will also be applied to the HTTP client that the credentials provider uses, but it sounds like that won't be an issue for you.

Oliboy50 commented 10 months ago

then they will also be applied to the HTTP client that the credentials provider uses, but it sounds like that won't be an issue for you

Since the identity cache loading takes around ~200ms (for us) and a simple DynamoDB query operation takes around ~5ms (for us), we want to know if we can set the connect/read timeouts, as part of the SdkConfig, to something close to our DynamoDB operations (such as 50ms or 100ms) without compromising the identity cache loading.

And if I understood correctly what you said, it sounds like we can't do that.

jdisanti commented 10 months ago

The connect/read timeouts occur at the HTTP client/connector level, so they don't include the identity resolution time. You should be able to safely set them that low. Just note that they will apply to any HTTP operations made inside of that identity resolver if you're using the default credentials chain. But it's only for the HTTP connect/read time, and not for the entire operation.

bruceadams commented 9 months ago

I found this issue while trying to figure out how to increase the default five second timeout for invoking the credentials process.

At best, the credentials process I use, saml2aws, takes about ten seconds, at worst, saml2aws does a two factor handshake with a human and I want to allow my humans a long time to respond.

What is working for me is this:

use aws_config::{identity::IdentityCache, BehaviorVersion, SdkConfig};

    let base = aws_config::defaults(BehaviorVersion::latest()).identity_cache(
        IdentityCache::lazy()
            .load_timeout(Duration::from_secs(90))
            .build(),
    );

I am not aware of this having any unfortunate side effects, but I don't know what side effects to look for 🤷🏻.

jfmontanaro commented 1 month ago

Just wanted to chime in here that I was having a similar problem to @bruceadams, his solution is working for me as well (thanks!)

I wonder if it might be reasonable to treat credential_process resolvers differently from other credential providers? i.e. if the user has a credential_process is configured, default to a longer timeout (perhaps 60 seconds) when invoking the process? It seems pretty likely that process credential providers are often used for human-in-the-loop type flows, for which of course you would want a longer timeout.

Maybe another option would be to allow specifying the timeout in the same place that the process credential provider is initially configured, e.g. in ~/.aws/config or wherever. But that would be a much broader change, since all of the AWS SDKs would have to support it, so I don't know.