hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.58k stars 9.54k forks source link

AWS API doesn't include RDS DB Name, causing RDS cluster and instances to be destroyed/rebuilt #4671

Closed sheeley closed 8 years ago

sheeley commented 8 years ago

I've created an RDS cluster as well as instances, and when running plan or apply, i see this:

-/+ aws_rds_cluster.default
    apply_immediately:            "" => "<computed>"
    availability_zones.#:         "3" => "<computed>"
    backup_retention_period:      "31" => "31"
    cluster_identifier:           "lumen-rds-cluster" => "lumen-rds-cluster"
    cluster_members.#:            "2" => "<computed>"
    database_name:                "" => "lumen" (forces new resource)
    db_subnet_group_name:         "lumen-main" => "lumen-main"
    endpoint:                     "lumen-rds-cluster.cluster-ch8hux0zznqj.us-east-1.rds.amazonaws.com" => "<computed>"
    engine:                       "aurora" => "<computed>"
    master_password:              "password" => "password"
    master_username:              "lumen" => "lumen"
    port:                         "3306" => "<computed>"
    preferred_backup_window:      "00:00-02:00" => "00:00-02:00"
    preferred_maintenance_window: "tue:02:15-tue:02:45" => "tue:02:15-tue:02:45"
    vpc_security_group_ids.#:     "1" => "<computed>"

-/+ aws_rds_cluster_instance.cluster_instances.0
    cluster_identifier:   "lumen-rds-cluster" => "${aws_rds_cluster.default.id}" (forces new resource)
    db_subnet_group_name: "lumen-main" => "lumen-main"
    endpoint:             "lumen-0.ch8hux0zznqj.us-east-1.rds.amazonaws.com" => "<computed>"
    identifier:           "lumen-0" => "lumen-0"
    instance_class:       "db.r3.large" => "db.r3.large"
    port:                 "3306" => "<computed>"
    publicly_accessible:  "false" => "0"
    writer:               "false" => "<computed>"

-/+ aws_rds_cluster_instance.cluster_instances.1
    cluster_identifier:   "lumen-rds-cluster" => "${aws_rds_cluster.default.id}" (forces new resource)
    db_subnet_group_name: "lumen-main" => "lumen-main"
    endpoint:             "lumen-1.ch8hux0zznqj.us-east-1.rds.amazonaws.com" => "<computed>"
    identifier:           "lumen-1" => "lumen-1"
    instance_class:       "db.r3.large" => "db.r3.large"
    port:                 "3306" => "<computed>"
    publicly_accessible:  "false" => "0"
    writer:               "true" => "<computed>"

For clarity, the source issue seems to be this:

    database_name:                "" => "lumen" (forces new resource)

This cluster is specified in my .tf as

resource "aws_rds_cluster" "default" {
  cluster_identifier = "lumen-rds-cluster"
  database_name = "lumen"
  master_username = "${var.rds_username}"
  master_password = "${var.rds_password}"
  backup_retention_period = 31
  preferred_backup_window = "00:00-02:00" # UTC
  preferred_maintenance_window = "Tue:02:15-Tue:02:45"
  db_subnet_group_name = "${aws_db_subnet_group.default.name}"
}

Database name should never change. If I set it in the state file and cat:

✗ cat terraform.tfstate | grep database_name
                            "database_name": "lumen",
✗ terraform plan
... (lots of lines removed)
✗ cat terraform.tfstate | grep database_name
                            "database_name": "",

I see this defined in builtin/providers/aws/resource_aws_rds_cluster.go, but I'm not sure exactly why it isn't being stored in the .tfstate file. I'll continue to look and try to submit a PR. Would love to get any sort of direction on why it wouldn't be saving to the state file.

dudymas commented 8 years ago

More on this... if I go my cli and type:

$ aws rds describe-db-clusters

It appears as though DatabaseName ... a field that the help docs explicitly say -should-be-there- ... is not. The field is required to create the rds cluster, that much is for sure. But I think the problem here may be asymmetric api syndrome.

Perhaps the code needs to call a different describe to actually get the dbname?

Pretty annoyed with this.

pixeleet commented 8 years ago

However aws rds describe-db-instances seems to include the DBName parameter.

dudymas commented 8 years ago

So... if you create a cluster, you have to specify a DatabaseName, however if you don't specify any instances, you have no way to get that back. Thus, terraform would be unable to know during a refresh if the cluster even has the right DatabaseName

I thiiiiink this bug falls onto aws or boto. Maybe both. I'll see if I can make a pull request to at least fix this if you do indeed have instances ... the provider shouldn't have too much trouble getting a list of instances and finding one that is associated.

what a pain.

pixeleet commented 8 years ago

Even if I do define aws_rds_cluster_instances it's still not going to get it back.

See relevant code: https://github.com/hashicorp/terraform/blob/cb52e232269dee43d75b47a7151739fcc685aa4e/builtin/providers/aws/resource_aws_rds_cluster.go#L233

It's calling DescribeCluster and DescribeInstances is the only call that actually returns the DBName, not even DatabaseName.

I'm looking into it, will update you if made any progress, if in the meantime any terraform core member would be able to say something on this would be nice.

pixeleet commented 8 years ago

Though AWS Documentation says it should be there. It's just not.

To my current understanding, we have to do an extra trip with a DescribeDBInstances call to get the DBName in the Read func.

pixeleet commented 8 years ago

I'm thinking about something like:

    // DescribeDBClusters is missing the crucial database_name
    // Check with DescribeDBInstances to retrieve DBName from there if any
    insstances, err := conn.DescribeDBInstances(&rds.DescribeDBInstancesInput{
        DBClusterIdentifier: aws.String(d.Id()),
    })

    if err != nil {
        log.Prinf("No instances found for cluster: %s", d.Id())
    }

    for _, i := range instances.DBInstances {
        if *i.DBClusterIdentifier == d.Id() {
            dbc.DatabaseName = i.DBName
            log.Printf("[DEBUG] Set database_name from instance info")
        }
    }

pseudo, pretty sure there's an error in it somewhere. It might be totally stupid, I just started actually writing GO yesterday. @phinze, can you weigh in on this if you have a minute?

dudymas commented 8 years ago

Aye... that's what I'd do to fix it too.

Yes, this seems like an AWS failure... as the more I dig into the python cli... it seems like they simply forgot about this.

catsby commented 8 years ago

Hello friends –

I stumbled on this a few days back and have an issue open with AWS. It has been confirmed to be an RDS API bug. I'm still waiting on a resolution from them :(

I'll look and see if there is anything we can do here to temporarily ignore the (lack of) database name, but am afraid of releasing code there... I hope an API fix is impending, I'll bump my thread with AWS support.

I'll let you know what I find

dudymas commented 8 years ago

@catsby thanks for the update. I've already found a couple shell script tricks to keep my infrastructure scripts working in the meantime...

catsby commented 8 years ago

I've ping AWS Support again, I'll post here if I hear from them. This was their response regarding the bug:

unspecified

sheeley commented 8 years ago

You all rule! I'm so thrilled to wake up to all this research. Fingers crossed there's a temporary workaround, but I can understand that it is truly an AWS bug.

sheeley commented 8 years ago

@catsby I'm trying to work around this using

lifecycle {
  ignore_changes = ["database_name"]
}

but still seeing

* aws_rds_cluster.default: diffs didn't match during apply. This is a bug with Terraform and should be reported.

I may not be understanding the way ignore_changes works, is this the intended use case? Are there other workarounds I could try?

sheeley commented 8 years ago

@dudymas could you share your workaround?

radeksimko commented 8 years ago

This is also affecting aws_db_instance.name, I assume similar workarounds can be applied.

dudymas commented 8 years ago

@sheeley I was using jq to simply remove the database name and then set it in a variable that got output. Later, when this is fixed, I can change the variable back to the resource.

All you do for this in jq is add the objects, so:

$ echo '{"some": { "nested" : { "object": "value" } } }'  | jq ' .  as $parent | $parent["some"]["nested"] + {"hi": "thur"} as $child | $parent["some"]["nested"] = $child '

You can do a lot more, but that's the gist of how I did it. I admit, I stopped using RDS after it took so long for amazon to fix this... instead I just scrapped this and instead plan to use some vitess ( see: vitess.io ) containers to do something similar to RDS

sheeley commented 8 years ago

@dudymas ah, thanks - i wound up just swapping to database_name = "" after the db is created, basically the same thing.

catsby commented 8 years ago

This is also affecting aws_db_instance.name, I assume similar workarounds can be applied.

@radeksimko I just tried aws_db_instance out and had no problem with the name. So far I've only seen this omission from the DescribeClusters call. Have you seen it elsewhere?

radeksimko commented 8 years ago

@catsby I will come up with a repro case and attach it here or in a new issue. It's on my todo list. I've seen it and I should be able to walk through the git history to see the exact configuration/combination.

It's just going to take some time (as in RDS everything takes time :smiley: )

catsby commented 8 years ago

OK, thanks @radeksimko – I ran the db instance test and ran plan/apply locally and didn't hit it, so maybe it's resolved? Let me know what you find, if you can reproduce it please open a new issue. Thanks!

catsby commented 8 years ago

For now, I've merged the work around in #4838

sheeley commented 8 years ago

thanks!

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.