hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.84k stars 9.19k forks source link

[Bug]: aws_emr_cluster failed to start. bootstrap action 3 failed with non-zero exit code #31660

Closed shqlz01 closed 9 months ago

shqlz01 commented 1 year ago

Terraform Core Version

Terraform v1.4.6 on windows_amd64

AWS Provider Version

registry.terraform.io/hashicorp/aws v4.67.0

Affected Resource(s)

resource "aws_emr_cluster" "cluster-dev" {
    name                                = var.emr-name
    release_label                       = var.emr-release-label
    applications                        = var.emr-applications
    service_role                        = data.aws_iam_role.emr-role.arn
    autoscaling_role                    = data.aws_iam_role.emr-autoscal-role.arn
    log_uri                             = var.emr-log-uri 
    termination_protection              = false
    scale_down_behavior                 = "TERMINATE_AT_TASK_COMPLETION"
    keep_job_flow_alive_when_no_steps   = true

    ec2_attributes {
        subnet_id                           = data.aws_subnet.emr-subnet.id
        emr_managed_master_security_group   = data.aws_security_group.emr-mn-sg.id
        emr_managed_slave_security_group    = data.aws_security_group.emr-cn-sg.id
        service_access_security_group       = data.aws_security_group.emr-access-sg.id
        instance_profile                    = data.aws_iam_instance_profile.emr-instance-profile.arn
        key_name                            = var.ssh-key
    }

    master_instance_group {
        instance_type = var.emr-m-instance-type
        ebs_config {
            size                    = var.emr-ebs-size
            type                    = var.emr-ebs-type
            volumes_per_instance    = 1
        }
        ebs_config {
            size                    = var.emr-m-ebs-size
            type                    = var.emr-ebs-type
            volumes_per_instance    = 1
        }
    }

    core_instance_group {
        instance_count  = var.core-count
        instance_type   = var.emr-instance-type
        ebs_config {
            size                    = var.emr-ebs-size
            type                    = var.emr-ebs-type
            volumes_per_instance    = 1
        }
    }

    configurations_json = file("dev-conf.json")

    step {
        name = var.step-1-name
        action_on_failure = "CONTINUE"
        hadoop_jar_step {
            jar = var.step-1-jar
            args = var.step-1-args
        }
    }
    ebs_root_volume_size = 100

    bootstrap_action {
        path = "file:/bin/echo"
        name = "Dummy bootstrap action to prevent EMR cluster recreation when configuration_json has parameter javax.jdo.option.ConnectionPassword"
        args = [md5(jsonencode(file("dev-conf.json")))]
    }

    bootstrap_action {
        name = var.bootstrap-name
        #path = "file://C:/dev-emr/package/assign_private_ip.py"
        path = var.bootstrap-loaction
        args = var.bootstrap-argument
    }

    lifecycle {
        ignore_changes = [ step,configurations_json ]
    }

    tags = {
        Name = var.emr-name
    }
}

resource "aws_emr_instance_group" "task" {
    cluster_id     = aws_emr_cluster.cluster-dev.id
    instance_count = var.task-count
    instance_type  = var.emr-instance-type
    ebs_config {
        size                    = var.emr-ebs-size
        type                    = var.emr-ebs-type
        volumes_per_instance    = 1
    }
}

resource "aws_emr_managed_scaling_policy" "policy" {
    cluster_id = aws_emr_cluster.cluster-dev.id
    compute_limits {
        unit_type = "Instances"
        minimum_capacity_units = 1
        maximum_capacity_units = 1
        maximum_core_capacity_units = 1
        maximum_ondemand_capacity_units = 1
    }
}

Expected Behavior

After bootstrap action 2 completed, it should be done

Actual Behavior

it's start to do the bootstrap action 3 that does not exit in my main.tf

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

my dev-conf.json:

[
    {
        "Classification": "hive-site",
        "Properties": {
            "javax.jdo.option.ConnectionURL": "jdbc:mysql://my-db-dev.cdx1mg1npfqg.rds.cn-north-1.amazonaws.com.cn:3306/metastore?createDatabaseIfNotExist=true",
            "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
            "javax.jdo.option.ConnectionUserName": "owhive",
            "javax.jdo.option.ConnectionPassword": "mypassword123",
            "hive.execution.engine": "tez",
            "hive.exec.pre.hooks": "org.apache.hadoop.hive.ql.hooks.ATSHook",
            "hive.exec.post.hooks": "org.apache.hadoop.hive.ql.hooks.ATSHook",
            "hive.exec.failure.hooks": "org.apache.hadoop.hive.ql.hooks.ATSHook",
            "dfs.permissions": "false",
            "hive.security.authorization.sqlstd.confwhitelist": "mapred.*|hive.*|mapreduce.*|spark.*|tez.*|io.*|parquet.*",
            "hive.security.authorization.sqlstd.confwhitelist.append": "mapred.*|hive.*|mapreduce.*|spark.*|tez*|io.*|parquet.*",
            "spark.yarn.jars": "hdfs://%{hiera('bigtop::hadoop_head_node')}:8020/spark-jars/*",
            "spark.executor.instances": "40",
            "spark.executor.cores": "2",
            "spark.default.parallelism": "300",
            "spark.executor.memory": "4096M",
            "hive.mapjoin.optimized.hashtable": "false",
            "hive.txn.manager": "org.apache.hadoop.hive.ql.lockmgr.DbTxnManager",
            "hive.support.concurrency": "true"
        }
    },
    {
        "Classification": "hive",
        "Properties": {
            "hive.llap.percent-allocation": "0.8",
            "hive.llap.enabled": "true"
        },
        "configurations": []
    },
    {
        "classification": "yarn-site",
        "properties": {
            "yarn.node-labels.enabled": "true"
        }
    },
    {
        "Classification": "capacity-scheduler",
        "Properties": {
            "yarn.scheduler.capacity.root.queues": "default,small",
            "yarn.scheduler.capacity.root.default.capacity": "90",
            "yarn.scheduler.capacity.root.small.capacity": "10",
            "yarn.scheduler.capacity.root.small.user-limit-factor": "1",
            "yarn.scheduler.capacity.root.default.maximum-capacity": "100",
            "yarn.scheduler.capacity.root.small.maximum-capacity": "50",
            "yarn.scheduler.capacity.root.default.state": "RUNNING",
            "yarn.scheduler.capacity.root.small.state": "RUNNING",
            "yarn.scheduler.capacity.root.default.acl_submit_applications": "*",
            "yarn.scheduler.capacity.root.small.acl_submit_applications": "*",
            "yarn.scheduler.capacity.root.default.acl_administer_queue": "*",
            "yarn.scheduler.capacity.root.small.acl_administer_queue": "*",
            "yarn.scheduler.capacity.root.default.maximum-application-lifetime": "-1",
            "yarn.scheduler.capacity.root.small.maximum-application-lifetime": "-1",
            "yarn.scheduler.capacity.root.default.default-application-lifetime": "-1",
            "yarn.scheduler.capacity.root.small.default-application-lifetime": "-1",
            "yarn.scheduler.capacity.root.default.ordering-policy": "fifo",
            "yarn.scheduler.capacity.queue-mappings": "u:tony.li:small"
        }
    },
    {
        "Classification": "spark-hive-site",
        "Properties": {
            "dfs.permissions": "false",
            "spark.yarn.jars": "hdfs://%{hiera('bigtop::hadoop_head_node')}:8020/spark-jars/*",
            "spark.executor.instances": "40",
            "spark.executor.cores": "2",
            "spark.default.parallelism": "300",
            "spark.executor.memory": "4096M"
        }
    },
    {
        "Classification": "hive-env",
        "Properties": {},
        "Configurations": [
            {
                "Classification": "export",
                "Properties": {
                    "HADOOP_HEAPSIZE": "12288",
                    "HADOOP_OPTS": "\"$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit\""
                }
            }
        ]
    },
    {
        "Classification": "pig-properties",
        "Properties": {
            "exectype": "tez"
        }
    },
    {
        "Classification": "tez-site",
        "Properties": {
            "tez.am.resource.memory.mb": "2048",
            "tez.container.max.java.heap.fraction": "0.9"
        }
    },
    {
        "Classification": "hdfs-site",
        "Properties": {
            "dfs.replication": "1",
            "fs.permissions.umask-mode": "0000"
        }
    },
    {
        "Classification": "emrfs-site",
        "Properties": {
            "fs.s3.maxRetries": "20"
        }
    },
    {
        "Classification": "spark",
        "Properties": {
            "maximizeResourceAllocation": "true"
        }
    },
    {
        "Classification": "hbase-site",
        "Properties": {
            "hbase.rootdir": "s3://my-dev-goldendata/hbase",
            "zookeeper.session.timeout": "30000",
            "hbase.regionserver.handler.count": "30",
            "hbase.hregion.max.filesize": "2147483648",
            "hbase.hregion.memstore.flush.size": "268435456",
            "hbase.bucketcache.size": "16384"
        }
    },
    {
        "Classification": "hbase",
        "Properties": {
            "hbase.emr.storageMode": "s3"
        }
    },
    {
        "classification": "hue-ini",
        "properties": {},
        "configurations": [
            {
                "classification": "desktop",
                "properties": {
                    "time_zone": "Asia/Shanghai"
                },
                "configurations": [
                    {
                        "classification": "database",
                        "properties": {
                            "password": "mypassword123",
                            "engine": "mysql",
                            "port": "3306",
                            "host": "my-db-dev.cdx1mg1npfqg.rds.cn-north-1.amazonaws.com.cn",
                            "name": "hue",
                            "user": "owhue"
                        },
                        "configurations": []
                    },
                    {
                        "classification": "ldap",
                        "properties": {},
                        "configurations": [
                            {
                                "classification": "ldap_servers",
                                "properties": {},
                                "configurations": [
                                    {
                                        "classification": "my",
                                        "properties": {
                                            "bind_dn": "administrator@my.cn",
                                            "search_bind_authentication": "true",
                                            "base_dn": "OU=dlp,DC=my,DC=cn",
                                            "bind_password": "o!TUk3(9Ef",
                                            "ldap_url": "ldap://172.31.4.226:389",
                                            "nt_domain": "my.cn"
                                        },
                                        "configurations": []
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "classification": "auth",
                        "properties": {
                            "backend": "desktop.auth.backend.AllowFirstUserDjangoBackend,desktop.auth.backend.LdapBackend"
                        },
                        "configurations": []
                    }
                ]
            },
            {
                "classification": "notebook",
                "properties": {
                    "interpreters_shown_on_wheel": "hive"
                },
                "configurations": [
                    {
                        "classification": "interpreters",
                        "properties": {},
                        "configurations": [
                            {
                                "classification": "hive",
                                "properties": {
                                    "name": "Hive",
                                    "interface": "hiveserver2"
                                }
                            },
                            {
                                "classification": "spark",
                                "properties": {
                                    "name": "Scala",
                                    "interface": "livy"
                                }
                            },
                            {
                                "classification": "pyspark",
                                "properties": {
                                    "name": "PySpark",
                                    "interface": "livy"
                                }
                            },
                            {
                                "classification": "java",
                                "properties": {
                                    "name": "Java",
                                    "interface": "oozie"
                                }
                            },
                            {
                                "classification": "mapreduce",
                                "properties": {
                                    "name": "MapReduce",
                                    "interface": "oozie"
                                }
                            },
                            {
                                "classification": "sqoop1",
                                "properties": {
                                    "name": "Sqoop1",
                                    "interface": "oozie"
                                }
                            },
                            {
                                "classification": "shell",
                                "properties": {
                                    "name": "Shell",
                                    "interface": "oozie"
                                }
                            }
                        ]
                    }
                ]
            },
            {
                "classification": "librdbms",
                "properties": {},
                "configurations": [
                    {
                        "classification": "databases",
                        "properties": {},
                        "configurations": [
                            {
                                "classification": "myhive",
                                "properties": {
                                    "nice_name": "My_Metastore",
                                    "name": "metastore",
                                    "engine": "mysql",
                                    "host": "my-db-dev.cdx1mg1npfqg.rds.cn-north-1.amazonaws.com.cn",
                                    "port": "3306",
                                    "user": "owhive",
                                    "password": "mypassword123"
                                },
                                "configurations": []
                            }
                        ]
                    }
                ]
            }
        ]
    }
]

Steps to Reproduce

1.terraform init 2.terraform apply -var-file .\dev-vars.tfvars

Debug Output

2023-05-30 10:48:13,334 INFO i-006bb40232897291e: new instance started
2023-05-30 10:48:13,385 INFO i-006bb40232897291e: bootstrap action 1 completed
2023-05-30 10:48:19,835 INFO i-006bb40232897291e: bootstrap action 2 completed
2023-05-30 10:48:19,837 ERROR i-006bb40232897291e: failed to start. bootstrap action 3 failed with non-zero exit code.
2023-05-30 10:48:26,630 INFO i-0500f686a3da54b9d: new instance started
2023-05-30 10:48:33,030 INFO i-0500f686a3da54b9d: bootstrap action 1 completed
2023-05-30 10:48:33,030 INFO i-0500f686a3da54b9d: bootstrap action 2 completed
2023-05-30 10:48:33,030 INFO i-0500f686a3da54b9d: all bootstrap actions complete and instance ready

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 year ago

Hey @shqlz01 👋 Thank you for taking the time to raise this! So that we have the information necessary to look into this, can you supply Terraform debug logs (redacted as needed) as well?

justinretzolk commented 9 months ago

Since we haven't heard back, I'm going to close this issue. If you're still having trouble, please feel free to open a new issue, referencing this one for context as needed.

github-actions[bot] commented 8 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.