Open Sowmya-aws opened 1 year ago
Voting for Prioritization
Volunteering to Work on This Issue
Description
How to trigger more than 1 job using TERRAFORM GLUE Trigger? I am very new to AWS & Terraform. Trying to do one simple ETL job through AWS GLUE service I have created terraform scripts for creating folders in S3 bucket dev and uploading PY scripts , input files. Output folder also created. Created one terraform script for GLUE-TRIGGER I want to know how to trigger more than 1 job, since 5 mins ones i need glue job to run i have scheduled for 5 mins ones. But trigger is failing, saying not able to fetch script.
Also if possible can anyone give me one simple AWS-GLUE job example? Please note i have built these referring AWS provided links only.
terraform-aws-glue-trigger.txt
- I need to know how to trigger more than 1 job glue_Read_write.txt
- if any simple glue etl -job example provided it would be great.
References
No response
Would you like to implement a fix?
None
👍🏻
job_example.tf
locals {
glue_version = "3.0"
glue_execution_class = "FLEX"
glue_worker_type = "G.1X"
glue_worker_num = 2 #min 2
glue_timeout = 2880
}
resource "aws_glue_job" "job_one" {
name = "ingest_job_one"
role_arn = aws_iam_role.role_that_can_run_this.arn #Can also use data.aws_iam_role source if your terraform didn't make the role
execution_class = local.glue_execution_class
glue_version = local.glue_version
worker_type = local.glue_worker_type
number_of_workers = local.glue_worker_num
timeout = local.glue_timeout
connections = [aws_glue_connection.connection_for_this_job_source.name]
command {
script_location = "s3://${aws_s3_bucket.script_location_bucket.id}/${aws_s3_object.glue_job_script.id}"
}
default_arguments = {
"--job-language" = "python"
}
provider = aws.aws-no-defaults #Terraform Bug Workaround
}
The following is an example of a workflow which will crawl anything defined in the local.crawlers, once all crawlers have run, it will then run all jobs defined in the local.jobs.
If you have no crawlers, then you'd just use the "ingest" trigger without the predicate and remove all crawler terraform.
workflow.tf
provider "aws" {
region = "your-region"
alias = "aws-no-defaults" #This is required if your default provider has tags_all defined. Otherwise you can ignore this
}
locals {
crawlers = [
aws_glue_crawler.crawler_one.name, #define these
aws_glue_crawler.crawler_two.name #define these
]
jobs = [
aws_glue_job.job_one.name, #define these
aws_glue_job.job_two.name #define these
]
}
resource "aws_glue_workflow" "workflow" {
name = "workflow-name-here"
provider = aws.aws-no-defaults #Terraform Bug Workaround
}
resource "aws_glue_trigger" "crawl" {
name = "start-crawl"
type = "ON_DEMAND"
workflow_name = aws_glue_workflow.workflow.name
dynamic "actions" {
for_each = local.crawlers
content {
crawler_name = actions.value
}
}
provider = aws.aws-no-defaults #Terraform Bug Workaround
}
resource "aws_glue_trigger" "ingest" {
name = "start-ingest"
type = "CONDITIONAL"
workflow_name = aws_glue_workflow.workflow.name
dynamic "actions" {
for_each = local.jobs
content {
job_name = actions.value
}
}
predicate {
logical = "AND"
dynamic "conditions" {
for_each = local.crawlers
content {
crawler_name = conditions.value
crawl_state = "SUCCEEDED"
logical_operator = "EQUALS"
}
}
}
provider = aws.aws-no-defaults #Terraform Bug Workaround
}
Hi @justinretzolk i do have doubt on above, actually this is like first terraform, glue script i am trying to do. I am not using any crawler jobs, so the above workflow -(for triggering multiple jobs ) can be used for all types of job is it ? also if i want to specify different time for different jobs how to specify ?
@Sowmya-aws A scheduled trigger defines the time for all jobs that it triggers. If you want a different time for given jobs, each one must be part of a separate trigger.
Description
How to trigger more than 1 job using TERRAFORM GLUE Trigger? I am very new to AWS & Terraform. Trying to do one simple ETL job through AWS GLUE service I have created terraform scripts for creating folders in S3 bucket dev and uploading PY scripts , input files. Output folder also created. Created one terraform script for GLUE-TRIGGER I want to know how to trigger more than 1 job, since 5 mins ones i need glue job to run i have scheduled for 5 mins ones. But trigger is failing, saying not able to fetch script.
Also if possible can anyone give me one simple AWS-GLUE job example? Please note i have built these referring AWS provided links only.
terraform-aws-glue-trigger.txt
1) I need to know how to trigger more than 1 job using TERRAFORM GLUE & with SCHEDULER OPTION glue_Read_write.txt
2) if any simple glue etl -job example provided it would be great.
References
No response
Would you like to implement a fix?
None