Open n-oden opened 3 years ago
As a postscript here, I believe that this should actually be considered a bug. Consider the following resource definition:
resource "google_dataflow_flex_template_job" "big_data_job" {
provider = google-beta
name = "dataflow-flextemplates-job"
container_spec_gcs_path = "gs://my-bucket/templates/template.json"
parameters = {
inputSubscription = "messages"
labels = "billing_component=dataproc"
}
}
The flex template launcher will pass the labels
param to the job, and duly apply those labels to it. But because labels are part of the job state, terraform will on its next run see the job as differing from the resource state, and try to update it: this in essence becomes a permanently tainted resource:
# google_dataflow_flex_template_job.big_data_job will be updated in-place
~ resource "google_dataflow_flex_template_job" "big_data_job" {
id = "2021-08-22_14_10_06-17678439902224270650"
~ labels = {
- "billing_component" = "dataproc" -> null
}
name = "dataflow-flextemplates-job"
# (7 unchanged attributes hidden)
}
@rileykarson is there any timeframe for getting this addressed that you can share? This effectively blocks us (and, I suspect, other people) from migrating to flex templates generally, since it is not possible to accurately track spend on flex jobs without using labels.
No timeline- I'm not certain that's entirely a bug either, just incorrectly specified. You could define the label in both places, right?
Hi,
any updates regarding this?
From my perspective without those options it's quite hard to use this resources from Terraform level (the default network is anti-pattern so in real projects needs to be defined including other parameters like subnetwork, service account, ip configuration, etc).
Right now I've checked and workaround suggested by @n-oden works but it requires on metadata.json
file parameters like:
"parameters": [
{
"name": "network",
"label": "Network.",
"helpText": "empty."
},
{
"name": "subnetwork",
"label": "Subnetwork",
"helpText": "empty."
},
{
"name": "service_account_email",
"label": "Service account email.",
"helpText": "empty."
},
{
"name": "ip_configuration",
"label": "IP Configuration.",
"helpText": "empty."
}
]
And later on Terraform definition:
parameters = {
input_subscription = "projects/my-project/subscriptions/my-sub"
output_table = "my-project:my-dataset.my-table"
# HACK for GCP Terraform Setup as those parameters are not supported in job level
network = "my-network"
subnetwork = "my-subnetwork"
service_account_email = "my-service-account-fqn"
ip_configuration = "WORKER_IP_PRIVATE"
}
Based on the above it would be nice to have options similar like for google_dataflow_job
- otherwise each flex template needs to be polluted with those parameters from my perspective.
So I would expect:
resource "google_dataflow_flex_template_job" "dataflow_demo_job" {
parameters = {
input_subscription = "projects/my-project/subscriptions/my-sub"
output_table = "my-project:my-dataset.my-table"
}
network = "my-network"
subnetwork = "my-subnetwork"
service_account_email = "my-service-account-fqn"
ip_configuration = "WORKER_IP_PRIVATE"
}
No timeline- I'm not certain that's entirely a bug either, just incorrectly specified. You could define the label in both places, right?
@n-oden - i just understood what @rileykarson means here.
the workaround for having (FlexTemplateRuntimeEnvironment](https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.locations.flexTemplates/launch#FlexTemplateRuntimeEnvironment) parameters is to define them in the parameters
section
the workaround for labels is to both define them in the parameters
section && use the deprecated labels
param
locals {
my_labels = {foo="bar"}
}
resource "google_dataflow_flex_template_job" "my_pipeline" {
provider = google-beta
name = "my-pipeline"
...
# this param is 'deprecated', but negates the noop changes
labels=local.my_labels
parameters = {
labels=jsonencode(local.my_labels)
maxNumWorkers=5
zone="europe-west2-c"
serviceAccount="foo@bar.com"
stagingLocation="gs://${local.dataflow_bucket_name}/stage/"
tempLocation="gs://${local.dataflow_bucket_name}/temp/"
network="https://www.googleapis.com/compute/v1/projects/my-proj/global/networks/my-net"
subnetwork="https://www.googleapis.com/compute/v1/projects/my-proj/regions/europe-west2/subnetworks/my-subnet"
}
}
Not sure if this is the exact same issue but we are passing the service account in the parameters section in the same way as in the official Google module :
java_pipeline_options = {
serviceAccount = var.service_account_email
subnetwork = var.subnetwork_self_link
dataflowKmsKey = var.kms_key_name
tempLocation = var.temp_location
stagingLocation = var.staging_location
maxNumWorkers = var.max_workers
usePublicIps = var.use_public_ips
enableStreamingEngine = var.enable_streaming_engine
}
python_pipeline_options = {
service_account_email = var.service_account_email
subnetwork = var.subnetwork_self_link
dataflow_kms_key = var.kms_key_name
temp_location = var.temp_location
staging_location = var.staging_location
max_num_workers = var.max_workers
no_use_public_ips = !var.use_public_ips
enable_streaming_engine = var.enable_streaming_engine
}
pipeline_options = var.job_language == "JAVA" ? local.java_pipeline_options : local.python_pipeline_options
This worked fine until now where we are seeing this error :
serviceAccount: Runtime parameter serviceAccount should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.
Details:
[
{
"@type": "type.googleapis.com/google.dataflow.v1beta3.InvalidTemplateParameters",
"parameterViolations": [
{
"description": "Runtime parameter serviceAccount should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.",
"parameter": "serviceAccount"
}
]
}
]
, badRequest
Since TF does not allow us to pass parameters in the environment section as it is not exposed, how can it produce this error ?
Runtime parameter serviceAccount should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.
That issue is from https://github.com/hashicorp/terraform-provider-google/issues/14679
Using the aforementioned workaround for labels by @ben-marengo-msmg above, we get the following error using Google-beta provider v5.0.0:
Error: googleapi: Error 400: The template parameters are invalid. Details:
labels: Runtime parameter labels should not be specified in both parameters field and environment field. Specifying runtime parameters in environment field is recommended.
Specifying labels in only the environment field does not apply them to the dataflow flex template job, and specifying them in parameters still gives the issue of terraform state not matching the labels configured on the dataflow, as per previous @n-oden comment
I think this should be considered a bug.
Community Note
Description
The
projects.locations.flexTemplates.launch
API used bygoogle_dataflow_flex_template
takes a LaunchFlexTemplateParameter object as its payload.LaunchFlexTemplateParameter
has anenvironment
field that takes a FlexTemplateRuntimeEnvironment object, and this is where callers can specify standard dataflow job options such as the region/zone, the number of workers, the network/subnet etc etc.Unfortunately, the
google_dataflow_flex_template_job
resource in the google-beta provider does not currently expose theenvironment
field, so it is not possible to pass any of these options to the job. As a workaround they can be specified in theparameters
section, but this requires that the param names be listed in the job's metadata.json file and any parameter not listed there cannot be passed.New or Affected Resource(s)
Potential Terraform Configuration