Open Demacr opened 6 months ago
The issue was confirmed after the replication with the error message Error 400: Invalid value at 'launch_parameter.environment.autoscaling_algorithm' (type.googleapis.com/google.dataflow.v1beta3.AutoscalingAlgorithm), "THROUGHPUT_BASED"
Good day @Demacr. Thank you for raising this issue. I noticed that
autoscalingAlgorithm = "THROUGHPUT_BASED"
was placed within the parameters
block in the google_dataflow_flex_template_job
. The paramters
argument should be used for template specific parameters. I've created #17612 to improve the documentation on this point.
We have an auto-generated example of the PubSub_to_BigQuery_Flex
template referenced in the terraform code above. You may find it here: v2/googlecloud-to-googlecloud/terraform/PubSub_to_BigQuery_Flex/dataflow_job.tf.
Please let us know if you need further help.
Hi @damondouglas ,
Oh, I tried this way also, you could see commented autoscaling param out of parameters
block in my original bug report message. It shows the same error:
β Error: googleapi: Error 400: Invalid value at 'launch_parameter.environment.autoscaling_algorithm' (type.googleapis.com/google.dataflow.v1beta3.AutoscalingAlgorithm), "THROUGHPUT_BASED"
β Details:
β [
β {
β "@type": "type.googleapis.com/google.rpc.BadRequest",
β "fieldViolations": [
β {
β "description": "Invalid value at 'launch_parameter.environment.autoscaling_algorithm' (type.googleapis.com/google.dataflow.v1beta3.AutoscalingAlgorithm), \"THROUGHPUT_BASED\"",
β "field": "launch_parameter.environment.autoscaling_algorithm"
β }
β ]
β }
β ]
I've tried this right now with 5.20 module version.
Good day, @Demacr. The terraform resource follows projects.jobs#Job.AutoscalingAlgorithm. I believe this is the behavior of terraform resources for the Google Provider. Generally, when I run into issues I look at the API reference to troubleshoot. The other clue that this may have been an incorrect input is from the 400 code of the API error that tells me that THROUGHPUT_BASED
was an "Invalid value".
@Demacr is there documentation somewhere that indicated that the API should support this value? Or are you wanting to request that the API should add this as an additional algorithm?
@melinath From this documentation. I used this values for autoscaling when created manually, and it worked.
@Demacr when you say "created manually", do you mean that you were manually able to make an API call that accepted THROUGHPUT_BASED
as an argument, or that you were able to locally spin up a pipeline following that document?
@melinath I mean originally created the flex dataflow by gcloud
cli command and then "terraform" that command.
For example real command with anonymized values:
gcloud dataflow flex-template run xxx \
--template-file-gcs-location gs://dataflow-templates-us-east1/latest/flex/PubSub_to_BigQuery_Flex \
--region us-east1 \
--worker-region us-east1 \
--subnetwork regions/us-east1/subnetworks/xxx \
--network xxx \
--additional-user-labels "" \
--parameters outputTableSpec=xxx,inputSubscription=xxx,useStorageWriteApiAtLeastOnce=false,javascriptTextTransformGcsPath=gs://xxx/transform_func.js,javascriptTextTransformFunctionName=transform,javascriptTextTransformReloadIntervalMinutes=15,serviceAccount=xxx,maxNumWorkers=3,numberOfWorkerHarnessThreads=2,diskSizeGb=30,workerMachineType=c2d-highmem-2,useStorageWriteApi=true,numStorageWriteApiStreams=8,storageWriteApiTriggeringFrequencySec=15,autoscalingAlgorithm=THROUGHPUT_BASED \
--project=xxx
@Demacr gcloud should send that to the API, so if it works in gcloud it should be possible to do in Terraform as well. If you add --log-http
to the gcloud command, can you see what API field it uses for THROUGHPUT_BASED in the API request?
@melinath
=======================
==== request start ====
uri: https://dataflow.googleapis.com/v1b3/projects/xxx/locations/us-east1/flexTemplates:launch?alt=json
method: POST
== headers start ==
b'accept': b'application/json'
b'accept-encoding': b'gzip, deflate'
b'authorization': --- Token Redacted ---
b'content-length': b'1113'
b'content-type': b'application/json'
b'user-agent': b'google-cloud-sdk gcloud/465.0.0 command/gcloud.dataflow.flex-template.run invocation-id/xxx environment/None environment-version/None client-os/MACOSX client-os-ver/23.4.0 client-pltf-arch/x86_64 interactive/True from-script/False python/3.12.2 term/xterm-256color (Macintosh; Intel Mac OS X 23.4.0)'
b'x-goog-api-client': b'cred-type/u'
== headers end ==
== body start ==
{"launchParameter": {"containerSpecGcsPath": "gs://dataflow-templates-us-east1/latest/flex/PubSub_to_BigQuery_Flex", "environment": {"enableStreamingEngine": false, "network": "xxx", "subnetwork": "regions/us-east1/subnetworks/xxx", "workerRegion": "us-east1"}, "jobName": "xxx", "parameters": {"autoscalingAlgorithm": "THROUGHPUT_BASED", "diskSizeGb": "30", "inputSubscription": "xxx", "javascriptTextTransformFunctionName": "transform", "javascriptTextTransformGcsPath": "gs://xxx/transform_func.js", "javascriptTextTransformReloadIntervalMinutes": "15", "maxNumWorkers": "3", "numStorageWriteApiStreams": "8", "numberOfWorkerHarnessThreads": "2", "outputTableSpec": "xxx", "serviceAccount": "xxx", "storageWriteApiTriggeringFrequencySec": "15", "useStorageWriteApi": "true", "useStorageWriteApiAtLeastOnce": "false", "workerMachineType": "c2d-highmem-2"}}}
== body end ==
==== request end ====
---- response start ----
status: 200
-- headers start --
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Cache-Control: private
Content-Encoding: gzip
Content-Type: application/json; charset=UTF-8
Date: Tue, 02 Apr 2024 20:47:53 GMT
Server: ESF
Transfer-Encoding: chunked
Vary: Origin, X-Origin, Referer
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 0
-- headers end --
-- body start --
{
"job": {
"id": "2024-04-02_13_47_52-10863458629911449595",
"projectId": "xxx",
"name": "xxx",
"currentStateTime": "1970-01-01T00:00:00Z",
"createTime": "2024-04-02T20:47:53.134279Z",
"location": "us-east1",
"startTime": "2024-04-02T20:47:53.134279Z"
}
}
-- body end --
total round trip time (request+response): 1.714 secs
---- response end ----
----------------------
Thanks for the logs, that's super helpful!
This looks like a valid issue to me - you're able to use gcloud
to get THROUGHPUT_BASED autoscaling, but can't do it with Terraform (even though both use the API).
Specifically, gcloud sets launchParameter.parameters.autoscalingAlgorithm
, while terraform explicitly extracts autoscalingAlgorithm from parameters
and sets it at launchParameter.environment.autoscalingAlgorithm
. That's not the only parameter treated this way, but perhaps it's the most impactful because it's treated differently by the API?
This behavior was introduced in 5.0.0 via https://github.com/GoogleCloudPlatform/magic-modules/pull/9031; it looks like we believed at the time that the environment and parameters fields should contain the same values?
In the long term the "fix" would probably be to introduce a separate environment
field on the resource so that users can explicitly set parameters and environment separately - but I don't know what the API behavior is or what the long-term plans for the API are so I don't know if that would make sense to do at this point. Separating the fields more explicitly would need to be behind a guard if introduced in a minor version since it's a major behavioral change.
Community Note
Terraform Version
Affected Resource(s)
google_dataflow_flex_template_job
Terraform Configuration
Debug Output
https://gist.github.com/Demacr/a6b30bb83f0105ba0764571d75b44ace
Expected Behavior
Created new job with autoscaling algorithm equal to
THROUGHPUT_BASED
Actual Behavior
It throws error of incompatible error:
Steps to reproduce
terraform apply
Important Factoids
I found that it accepts only values which noticed by the link: https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job.AutoscalingAlgorithm
I used
AUTOSCALING_ALGORITHM_BASIC
value and then it accepts the value and creates job, but it doesn't appearautoscalingAlgorithm
record in the job parameters.References
No response
b/329834219