Closed westrachel closed 10 months ago
Voting for Prioritization
Volunteering to Work on This Issue
Hi,
I haven't read the whole issue in great detail but a couple of things stood out. You said "Confirm that the RDS proxy and lambda functions have appropriate permissions through IAM that enable them to do what they need to do to communicate properly; I believe they do. For example, the lambda function that's trying to connect to the proxy does have the rds-db:connect permission allowed for it to connect to the RDS cluster|
Is this relevant as you're not using IAM auth to connect to the RDS cluster so your Lambda doesn't need these permissions anyway?
On your DB proxy can you try:-
resource "aws_db_proxy" "yeti_proxy" {
name = "yeti-proxy"
debug_logging = false
engine_family = "POSTGRESQL"
idle_client_timeout = 1800
role_arn = aws_iam_role.rds_proxy_role.arn
vpc_security_group_ids = [aws_security_group.rds_cluster.id]
vpc_subnet_ids = [aws_default_subnet.default_az1.id,
aws_default_subnet.default_az2.id,
aws_default_subnet.default_az3.id]
auth {
auth_scheme = "SECRETS"
client_password_auth_type = "POSTGRES_MD5"
iam_auth = "DISABLED"
secret_arn = aws_secretsmanager_secret.rds_cluster_pw.arn
}
}
Note - I have added client_password_auth_type = "POSTGRES_MD5" to the auth block. For Postgres it seems to default to POSTGRES_SCRAM_SHA_256 which has in the past shown similar behaviour as you've described here.
Try that and see if it makes any difference and/or compare your console created one versus the Terraform created one.
Thank you for the suggestion! You're right the rds-db:connect
doesn't matter for this. I had mixed up in my notes that it was required regardless of the authentication mode, but it's only for IAM authentication mode, which I'm not currently using.
I added the attribute assignment you suggested, client_password_auth_type = "POSTGRES_MD5"
and unfortunately it didn't make a difference. I re-invoked the lambda function and it still resulted in the following error message viewable in CloudWatch.
Unknown error. SSL connection has been closed unexpectedly
I also have enhanced logging on for the proxy, but this auth
change combined with enhanced logging didn't result in a more informative error in the Cloudwatch logs; the proxy logs just show:
Proxy authentication with PostgreSQL native password authentication succeeded for user <var.db_username> with TLS on.
A TCP connection was established from the proxy at <IP>:<PORT> to the database at <IP>:5432.
The new database connection successfully authenticated with TLS on.
The database connection closed. Reason: An internal error occurred.
For additional reference, the proxy I had temporarily created in the console that facilitated connections without error was using SCRAM SHA 256
for the client authentication type instead of PostgreSQL MD5
, which is what client_password_auth_type = "POSTGRES_MD5"
configures.
Weird..have you opened an AWS support ticket to ask them if they can see what's going on?
Maybe try client_password_auth_type = "POSTGRES_SCRAM_SHA_256" but I think that's the default anyway?
Also are you using the exact same secret and IAM roles / policies with the manually created proxy? I think your aws_iam_policy resource retrieve_rds_secret_policy is missing KMS permissions per the docs at https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/rds-proxy-setup.html#rds-proxy-iam-setup so possibly the proxy can't actually read the secret? But that doesn't make sense as the logs seem to suggest it can?
In summary - weird problem. I don't know what's wrong. Sorry!
I wasn't aware of the AWS support ticket option. I will give that a shot! Thank you for the idea!
You're right, SCRAM SHA 256
is the default auth type. Both my terraform proxy were initially using that authentication mode before your suggestion to see what happens when toggling that configuration.
I am assigning the 2 proxies the same role that I'm creating through terraform and I've studied the console configuration details across both and they have the same configurations for all the settings. There was some example AWS document that suggested kms
permissions were necessary only if you're using a custom KMS key, which I'm currently not. Since the console proxy works with the terraform configured role that doesn't have the underlying kms
permission attached through a policy, I don't think adding that should make a difference. The terraform proxy logs I've included above explicitly say Proxy authentication with PostgreSQL native password authentication succeeded
for my db user, suggesting it can parse the secret okay; if it couldn't I'd expect the logs to show an error message like the one in this forum about it not being able to retrieve a secret.
Okay, I never got a response about the AWS support ticket I opened. However, I realized that I could enable logs for the db instances in addition to all the other (enhanced) logs that I already had enabled for other components. Looking at the db instance logs, I can see errors related to the init_query
's "SET x=1, y=2"
. I have that in the configuration of the aws_db_proxy_default_target_group
resource b/c I never adjusted it from the example in the aws terraform docs. I technically don't need it and that attribute is optional. Post removing it, the lambda function is able to connect to the RDS db proxy created through terraform and execute SQL statements successfully.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Terraform Core Version
1.6.4
AWS Provider Version
3.6.0, 5.25.0
Affected Resource(s)
aws_db_proxy, aws_db_proxy_default_target_group, aws_db_proxy_target
Expected Behavior
When I invoke a Lambda function that connects to an RDS Aurora PostgreSQL cluster that was provisioned through terraform I expect the lambda function to be able to maintain a connection to the database without errors so that it can execute SQL statements successfully.
Actual Behavior
When I invoke a Lambda function that connects to an RDS Aurora PostgreSQL cluster that was provisioned through Terraform the database connection is dropped when the lambda function tries to invoke SQL statements preventing the lambda function from doing meaningful work. The error is vague, but I believe there is a problem with the RDS proxy based on the steps I have taken to debug this.
Relevant Error/Panic Output Snippet
The proxy's Cloudwatch logs show the following messages:
.env file needs to contain the following variables. Please replace the values with values appropriate for your AWS account. The port is the postgresql port, 5432. Also note that the role I temporarily created for the terraform provider to use was toggled to be overly permissive. Specifically, I created a temporary role that has an underlying policy that allows all actions across all resources.
ACCOUNT_ID=
PROVIDER_AWS_ROLE=
AWS_KEY_ID=
AWS_KEY_VALUE=
PORT=
Steps to Reproduce
(1)
terraform plan
andterraform apply
all the following resources in steps. Note that the code below references local variables (var
andlocal
), whose values need to be configured (I show dummy versions of the files that have these contents under the Terraform Configuration Files section). Also, I sayapply
the following changes in steps, because they cannot all be applied at once. For example, theaws_iam_role_policy_attachment
s cannot be applied before some of the underlying policies being attached are created, because that will result in an error because thearn
s aren't available yet. I'm not currently showing the lambda function code for the rotating secret lambda or the create table lambda, but let me know if that's needed and I can provide a sample. The create table lambda is the lambda that is trying to connect to the RDS proxy to execute SQL statements.main.tf file contents:
(2) Within the Cloudwatch management console, navigate to the logs for the create table lambda function and the proxy that's created to see the errors.
Debug Output
No response
Panic Output
No response
Important Factoids
Here are the things I have checked/done to try to debug and prevent this error from happening:
rds-db:connect
permission allowed for it to connect to the RDS clusterAvailable
and running when trying to connect through the RDS proxy provisioned through terraformReferences
I referenced the following AWS docs while trying to debug this:
Would you like to implement a fix?
None