treeverse / lakeFS

lakeFS - Data version control for your data lake | Git for data
https://docs.lakefs.io
Apache License 2.0
4.47k stars 359 forks source link

Support rolling database credentials #3931

Open johnnyaug opened 2 years ago

johnnyaug commented 2 years ago

Allow to safely replace the database credentials used by lakeFS to connect to Postgres. This is useful, for example, when connecting to AWS RDS using IAM roles: this method provides credentials that expire after 15 minutes. This was brought up on this Slack thread.

dacort commented 2 years ago

A few thoughts/clarifications here as the original thread author.

I found that the IAM-based authentication is, in reality, a process that requires calling the generate-db-auth-token to generate a short-lived token that can then be used as the password to connect to the database. Hence the "rolling" nature of the credential.

I modified db/connect.go in LakeFS to try to make use of this token (see code below) and, while that worked for the initial connection, it looks like MigrateUp in db/migration.go makes use of the original database parameters and raised an exception because the password field was missing.

db/connect.go modification ``` if config.ConnConfig.Password == "" { log.Warn("No password provided for database connection, attempting IAM authentication") dbEndpoint := fmt.Sprintf("%s:%d", config.ConnConfig.Host, config.ConnConfig.Port) awsRegion := "us-west-2" conf := aws.NewConfig().WithRegion(awsRegion) sess := session.Must(session.NewSession(conf)) creds := sess.Config.Credentials authToken, err := rdsutils.BuildAuthToken(dbEndpoint, awsRegion, config.ConnConfig.User, creds) if err != nil { panic(err) } config.ConnConfig.Password = authToken } ```

What ended up working for me, at least for now to not expose the password, was disabling IAM auth and using a normal password stored in Secrets Manager. ECS supports a "secrets" section where it can populate environment variables from a value in Secrets Manager. I noticed that pgconn.ParseConfig supports merging environment variables into the connection string. So in my ECS config, I set LAKEFS_DATABASE_CONNECTION_STRING to postgresql://lakefsadmin@{hostname}:{port}/postgres (notice no password) and then I set PGPASSWORD in my secrets to the ARN of my database password secret in Secrets Manager. And lakefs/pgx magically merges the password into my connection string. 🙌

For reference, here's the (python) CDK resource I ended up with for running this on Fargate in ECS:

connection_string = f"postgresql://lakefsadmin@{rds.db_instance_endpoint_address}:{rds.db_instance_endpoint_port}/postgres"

lakefs_service = ecs_patterns.ApplicationLoadBalancedFargateService(
    self,
    "lakefs-service",
    cluster=cluster,
    memory_limit_mib=1024,
    desired_count=1,
    cpu=512,
    task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
        image=ecs.ContainerImage.from_registry(
            "treeverse/lakefs:latest",
            credentials=secretsmanager.Secret.from_secret_name_v2(
                    self, "DockerHubPAT", "dev/DockerHubSecret"
                )
        ),
        container_port=8000,
        execution_role=lakefs_dbrole,   # This role is used when provisioning containers (will be granted access to dockerhub secret)
        task_role=lakefs_containerrole, # This role is used by the running container (needs access to database.secret)
        environment={
            "LAKEFS_BLOCKSTORE_TYPE": "s3",
            "LAKEFS_DATABASE_CONNECTION_STRING": connection_string,
        },
        secrets={
            "LAKEFS_AUTH_ENCRYPT_SECRET_KEY": ecs.Secret.from_secrets_manager(
                secret_key
            ),
            "PGPASSWORD": ecs.Secret.from_secrets_manager(database.secret, "password"),
        },
    ),
)
github-actions[bot] commented 1 year ago

This issue is now marked as stale after 90 days of inactivity, and will be closed soon. To keep it, mark it with the "no stale" label.

arielshaqed commented 1 year ago

Marking "not stale" until proven otherwise, this all sounds convincingly like an issue for some particular configuration on AWS.