hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.83k stars 9.18k forks source link

How to trigger more than 1 job using TERRAFORM GLUE Trigger? #terraform #aws glue #31412

Open Sowmya-aws opened 1 year ago

Sowmya-aws commented 1 year ago

Description

How to trigger more than 1 job using TERRAFORM GLUE Trigger? I am very new to AWS & Terraform. Trying to do one simple ETL job through AWS GLUE service I have created terraform scripts for creating folders in S3 bucket dev and uploading PY scripts , input files. Output folder also created. Created one terraform script for GLUE-TRIGGER I want to know how to trigger more than 1 job, since 5 mins ones i need glue job to run i have scheduled for 5 mins ones. But trigger is failing, saying not able to fetch script.

Also if possible can anyone give me one simple AWS-GLUE job example? Please note i have built these referring AWS provided links only.

terraform-aws-glue-trigger.txt

1) I need to know how to trigger more than 1 job using TERRAFORM GLUE & with SCHEDULER OPTION glue_Read_write.txt

2) if any simple glue etl -job example provided it would be great.

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

Sowmya-aws commented 1 year ago

Description

How to trigger more than 1 job using TERRAFORM GLUE Trigger? I am very new to AWS & Terraform. Trying to do one simple ETL job through AWS GLUE service I have created terraform scripts for creating folders in S3 bucket dev and uploading PY scripts , input files. Output folder also created. Created one terraform script for GLUE-TRIGGER I want to know how to trigger more than 1 job, since 5 mins ones i need glue job to run i have scheduled for 5 mins ones. But trigger is failing, saying not able to fetch script.

Also if possible can anyone give me one simple AWS-GLUE job example? Please note i have built these referring AWS provided links only.

terraform-aws-glue-trigger.txt

  1. I need to know how to trigger more than 1 job glue_Read_write.txt
  2. if any simple glue etl -job example provided it would be great.

References

No response

Would you like to implement a fix?

None

👍🏻

devforbes commented 1 year ago

job_example.tf

locals {
    glue_version         = "3.0"
    glue_execution_class = "FLEX"
    glue_worker_type     = "G.1X"
    glue_worker_num      = 2                    #min 2
    glue_timeout         = 2880
}

resource "aws_glue_job" "job_one" {
  name              = "ingest_job_one"
  role_arn          = aws_iam_role.role_that_can_run_this.arn   #Can also use data.aws_iam_role source if your terraform didn't make the role
  execution_class   = local.glue_execution_class
  glue_version      = local.glue_version
  worker_type       = local.glue_worker_type
  number_of_workers = local.glue_worker_num
  timeout           = local.glue_timeout

  connections = [aws_glue_connection.connection_for_this_job_source.name]
  command {
    script_location = "s3://${aws_s3_bucket.script_location_bucket.id}/${aws_s3_object.glue_job_script.id}"
  }
  default_arguments = {
      "--job-language"                     = "python"
  }

  provider = aws.aws-no-defaults                                                              #Terraform Bug Workaround
}

The following is an example of a workflow which will crawl anything defined in the local.crawlers, once all crawlers have run, it will then run all jobs defined in the local.jobs.

If you have no crawlers, then you'd just use the "ingest" trigger without the predicate and remove all crawler terraform.

workflow.tf

provider "aws" {                                                                  
  region = "your-region"
  alias  = "aws-no-defaults"                                          #This is required if your default provider has tags_all defined. Otherwise you can ignore this              
}
locals {
    crawlers = [
                            aws_glue_crawler.crawler_one.name,              #define these
                            aws_glue_crawler.crawler_two.name               #define these
                        ]
    jobs     = [
                            aws_glue_job.job_one.name,                  #define these
                            aws_glue_job.job_two.name                   #define these
                        ]
}

resource "aws_glue_workflow" "workflow" {
  name = "workflow-name-here"

  provider = aws.aws-no-defaults                                      #Terraform Bug Workaround
}

resource "aws_glue_trigger" "crawl" {
  name          = "start-crawl"
  type          = "ON_DEMAND"
  workflow_name = aws_glue_workflow.workflow.name

  dynamic "actions" {
    for_each = local.crawlers
    content {
      crawler_name = actions.value
    }
  }

  provider = aws.aws-no-defaults                                      #Terraform Bug Workaround
}

resource "aws_glue_trigger" "ingest" {
  name          = "start-ingest"
  type          = "CONDITIONAL"
  workflow_name = aws_glue_workflow.workflow.name

  dynamic "actions" {
    for_each = local.jobs
    content {
      job_name = actions.value
    }
  }

  predicate {
    logical  = "AND"
    dynamic "conditions" {
      for_each = local.crawlers
      content {
          crawler_name      = conditions.value
          crawl_state       = "SUCCEEDED"
          logical_operator  = "EQUALS"
      }
    }  
  }

  provider = aws.aws-no-defaults                                      #Terraform Bug Workaround
}
Sowmya-aws commented 1 year ago

Hi @justinretzolk i do have doubt on above, actually this is like first terraform, glue script i am trying to do. I am not using any crawler jobs, so the above workflow -(for triggering multiple jobs ) can be used for all types of job is it ? also if i want to specify different time for different jobs how to specify ?

cloudhunter89 commented 1 year ago

@Sowmya-aws A scheduled trigger defines the time for all jobs that it triggers. If you want a different time for given jobs, each one must be part of a separate trigger.