lancedb / lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..
https://lancedb.github.io/lance/
Apache License 2.0
3.75k stars 205 forks source link

AWS Credentials #1134

Open resonating-sirsh opened 1 year ago

resonating-sirsh commented 1 year ago

Greetings, Locally I can connect Lance to s3 storage using my ENV AWS keys but on Kubernetes AWS/EKS im running into an unexpected issue. The error i get is this one

panicked at 'called `Result::unwrap()` on an `Err` value: CredentialsNotLoaded(CredentialsNotLoaded { source: "no providers in chain provided credentials" })', /home/runner/work/lance/lance/rust/src/io/object_store.rs:118:83

The reason why this is unexpected for me is that on EKS/K8s I typically run many services that use the AWS Creds/Service Account on the nodes such that I typically do not need to specify additional keys when running on pods in those nodes - My pods load a default region via env too.

Looking through the lance code im not 100% sure if there is definitely a need for me to do something extra here to specify credentials for AWS and if so, what are my options for passing creds. Is there anything i should load into the env? Do i need to configure something that I have not thought of in the connect constructor??

Any pointers appreciated.

Cheers

chebbyChefNEQ commented 1 year ago

Hi,

Could you try rerunning with RUST_LOG=debug and LANCE_LOG=debug env var? This should give us more information on why the credential fetching failed.

resonating-sirsh commented 1 year ago

Hi,

Could you try rerunning with RUST_LOG=debug and LANCE_LOG=debug env var? This should give us more information on why the credential fetching failed.

Will do. To be clear do you mean something like this?

import os
LANCE_ROOT = 's3://somewhere'
os.environ["RUST_LOG"] = "debug"
os.environ["LANCE_LOG"] = "debug"
import lancedb

def lance_connect():
    return lancedb.connect(LANCE_ROOT, region=os.environ.get("AWS_DEFAULT_REGION"))
chebbyChefNEQ commented 1 year ago

I think what you posted will work. If not, could you try running with RUST_LOG=debug LANCE_LOG=debug python ./my_script.py?

resonating-sirsh commented 1 year ago

I think what you posted will work. If not, could you try running with RUST_LOG=debug LANCE_LOG=debug python ./my_script.py?

Cool, ill let you know. ill try the first thing first. Its on EKS so there is a bit of setup and build time involved...

resonating-sirsh commented 1 year ago

Ok @chebbyChefNEQ for some reason the os.env did not work but I could export the 2 ENV settings in my Dockerfile

I can see what its doing now so many thanks for this info! it seems to assume an AWS config is in $HOME which I would not have thought is the only way to setup the creds (I realize i can mount some creds there if I need to). Incidentally, i'm not so sure the notion of profiles makes sense in this K8s context . Anyway here are the logs;

[2023-08-13T00:26:31Z INFO  aws_config::meta::region] load_region; provider=DefaultRegionChain(RegionProviderChain { providers: [EnvironmentVariableRegionProvider { env: Env(Real) }, ProfileFileRegionProvider { provider_config: ProviderConfig { env: Env(Real), fs: Fs(Real), sleep: Some(TokioSleep), region: None } }, ImdsRegionProvider { client: LazyClient { client: OnceCell { value: None }, builder: Builder { max_attempts: None, endpoint: None, mode_override: None, token_ttl: None, connect_timeout: None, read_timeout: None, config: Some(ProviderConfig { env: Env(Real), fs: Fs(Real), sleep: Some(TokioSleep), region: None }) } }, env: Env(Real) }] })
[2023-08-13T00:26:31Z INFO  aws_config::meta::region] load_region; provider=EnvironmentVariableRegionProvider { env: Env(Real) }
[2023-08-13T00:26:31Z DEBUG tracing::span] build_profile_provider;
[2023-08-13T00:26:31Z DEBUG hyper_rustls::config] with_native_roots processed 129 valid and 0 invalid certs
[2023-08-13T00:26:31Z INFO  aws_config::meta::region] load_region; provider=DefaultRegionChain(RegionProviderChain { providers: [EnvironmentVariableRegionProvider { env: Env(Real) }, ProfileFileRegionProvider { provider_config: ProviderConfig { env: Env(Real), fs: Fs(Real), sleep: Some(TokioSleep), region: None } }, ImdsRegionProvider { client: LazyClient { client: OnceCell { value: None }, builder: Builder { max_attempts: None, endpoint: None, mode_override: None, token_ttl: None, connect_timeout: None, read_timeout: None, config: Some(ProviderConfig { env: Env(Real), fs: Fs(Real), sleep: Some(TokioSleep), region: None }) } }, env: Env(Real) }] })
[2023-08-13T00:26:31Z INFO  aws_config::meta::region] load_region; provider=EnvironmentVariableRegionProvider { env: Env(Real) }
[2023-08-13T00:26:31Z DEBUG aws_config::default_provider::credentials] provide_credentials; provider=default_chain
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] load_credentials; provider=Environment
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] provider in chain did not provide credentials provider=Environment context=the credential provider was not enabled: environment variable not set (CredentialsNotLoaded(CredentialsNotLoaded { source: "environment variable not set" }))
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] load_credentials; provider=Profile
[2023-08-13T00:26:31Z DEBUG aws_config::fs_util] loaded home directory src="HOME"
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] load_config_file; file=Default(Config)
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] performing home directory substitution home="/root" path="~/.aws/config"
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] home directory expanded before="~/.aws/config" after="/root/.aws/config"
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] config file not found path=~/.aws/config
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] config file loaded path=Some("/root/.aws/config") size=0
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] load_config_file; file=Default(Credentials)
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] performing home directory substitution home="/root" path="~/.aws/credentials"
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] home directory expanded before="~/.aws/credentials" after="/root/.aws/credentials"
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] config file not found path=~/.aws/credentials
[2023-08-13T00:26:31Z DEBUG aws_config::profile::parser::source] config file loaded path=Some("/root/.aws/credentials") size=0
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] provider in chain did not provide credentials provider=Profile context=the credential provider was not enabled: No profiles were defined (CredentialsNotLoaded(CredentialsNotLoaded { source: NoProfilesDefined }))
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] load_credentials; provider=WebIdentityToken
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] provider in chain did not provide credentials provider=WebIdentityToken context=the credential provider was not enabled: $AWS_WEB_IDENTITY_TOKEN_FILE was not set (CredentialsNotLoaded(CredentialsNotLoaded { source: "$AWS_WEB_IDENTITY_TOKEN_FILE was not set" }))
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] load_credentials; provider=EcsContainer
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] provider in chain did not provide credentials provider=EcsContainer context=the credential provider was not enabled: ECS provider not configured (CredentialsNotLoaded(CredentialsNotLoaded { source: "ECS provider not configured" }))
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] load_credentials; provider=Ec2InstanceMetadata
[2023-08-13T00:26:31Z DEBUG aws_config::imds::credentials] loading credentials from IMDS
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation;
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; operation="get"
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; service="imds"
[2023-08-13T00:26:31Z DEBUG aws_smithy_http_tower::map_request] async_map_request; name="attach_imds_token"
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation;
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; operation="get-token"
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; service="imds"
[2023-08-13T00:26:31Z DEBUG aws_smithy_http_tower::map_request] map_request; name="generate_user_agent"
[2023-08-13T00:26:31Z DEBUG tracing::span] dispatch;
[2023-08-13T00:26:31Z DEBUG hyper::client::connect::http] connecting to 169.254.169.254:80
[2023-08-13T00:26:31Z DEBUG hyper::client::connect::http] connected to 169.254.169.254:80
[2023-08-13T00:26:31Z DEBUG hyper::proto::h1::io] flushed 242 bytes
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; status="dispatch_failure"
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; message=dispatch failure: timeout: HTTP read timeout occurred after 1s: timed out (DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: HttpTimeoutError { kind: "HTTP read", duration: 1s } } }))
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; status="construction_failure"
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; message=failed to construct request: failed to load IMDS session token: dispatch failure: timeout: HTTP read timeout occurred after 1s: timed out (ConstructionFailure(ConstructionFailure { source: FailedToLoadToken(FailedToLoadToken { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: HttpTimeoutError { kind: "HTTP read", duration: 1s } } }) }) }))
[2023-08-13T00:26:32Z DEBUG aws_config::meta::credentials::chain] provider in chain did not provide credentials provider=Ec2InstanceMetadata context=the credential provider was not enabled: could not communicate with IMDS: dispatch failure: timeout: HTTP read timeout occurred after 1s: timed out (CredentialsNotLoaded(CredentialsNotLoaded { source: ImdsCommunicationError { source: DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: HttpTimeoutError { kind: "HTTP read", duration: 1s } } }) } }))
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: CredentialsNotLoaded(CredentialsNotLoaded { source: "no providers in chain provided credentials" })', /home/runner/work/lance/lance/rust/src/io/object_store.rs:118:83
chebbyChefNEQ commented 1 year ago
[2023-08-13T00:26:31Z DEBUG aws_config::meta::credentials::chain] load_credentials; provider=Ec2InstanceMetadata
[2023-08-13T00:26:31Z DEBUG aws_config::imds::credentials] loading credentials from IMDS
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation;
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; operation="get"
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; service="imds"
[2023-08-13T00:26:31Z DEBUG aws_smithy_http_tower::map_request] async_map_request; name="attach_imds_token"
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation;
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; operation="get-token"
[2023-08-13T00:26:31Z DEBUG aws_smithy_client] send_operation; service="imds"
[2023-08-13T00:26:31Z DEBUG aws_smithy_http_tower::map_request] map_request; name="generate_user_agent"
[2023-08-13T00:26:31Z DEBUG tracing::span] dispatch;
[2023-08-13T00:26:31Z DEBUG hyper::client::connect::http] connecting to 169.254.169.254:80
[2023-08-13T00:26:31Z DEBUG hyper::client::connect::http] connected to 169.254.169.254:80
[2023-08-13T00:26:31Z DEBUG hyper::proto::h1::io] flushed 242 bytes
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; status="dispatch_failure"
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; message=dispatch failure: timeout: HTTP read timeout occurred after 1s: timed out (DispatchFailure(DispatchFailure { source: ConnectorError { kind: Timeout, source: HttpTimeoutError { kind: "HTTP read", duration: 1s } } }))
[2023-08-13T00:26:32Z DEBUG aws_smithy_client] send_operation; status="construction_failure"

The credential provider chain could not talk to IMDS. This like most likely a configuration issue in the node or cluster networking.

Instance MetaData Service (IMDS) is what provides aws creds inside EC2. Due to security reasons, AWS limit the packets from the service, at 169.254.169.254, to 1 hop TTL. In container networking this blocks containers from talking to IMDS.

[ IMDS ] -- [ EC2 ] -- [ Container ]

^^^ this is two hops.

While EKS default to two hops these days, many things could block access, like calico policy, or using IMDSv1 on the host.