Open ghost opened 6 years ago
I have found a dirty work-around:
At the top in "options": "zone" : null, "workerMachineType" : "n1-standard-1", "gcpTempLocation" : "gs://dataflow-staging-us-central1-473832897378/temp/",
Again at the bottom of sdkPipelineOptions: }, { "namespace" : "org.apache.beam.runners.dataflow.options.DataflowPipelineOptions", "key" : "templateLocation", "type" : "STRING", "value" : "gs://dataflow-templates-staging/2018-10-08-00_RC00/PubSub_to_BigQuery" }, { "namespace" : "org.apache.beam.runners.dataflow.options.DataflowPipelineWorkerPoolOptions", "key" : "workerMachineType", "type" : "STRING", "value" : "n1-standard-1" } ] },
And finally in "workerPools": "dataDisks" : [ { } ], "machineType" : "n1-standard-1", "numWorkers" : 0,
I realise it is a bit hacky, but it works. The pipeline gets successfully deployed on a n1-standard-1 Compute Engine instead of the default n1-standard-4.
machine_type
is now configurable. The others aren't yet.
Is there any planned timeline for implementing diskSizeGb
to be configurable as well? As documented in Google Dataflow's common error guidance we'd like to be able to manage the workers Disk Size when managing Dataflow jobs with terraform.
Another feature nice to have is the possibility to set numbers of workers.
+1, would like to be able to set disk_size_gb
and worker_disk_type
This is a slightly weird API because we actually call a Launch endpoint that only seems to support https://cloud.google.com/dataflow/docs/reference/rest/v1b3/RuntimeEnvironment, as opposed to being able to configure the WorkerPool directly as described in https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#Job.WorkerPool.
It looks like the current state of the request is:
machine_type
is configurabledisk_size_gb
is available in the API but not configurable in TF (coverage gap)disk_type
is not available in the APII'm forwarding to the service team to weigh in, but IMO we will likely want to split those last two items into separate tickets, since disk_size_gb
should be significantly easier to implement.
can we also get support for max_cache_memory_usage_mb
Affected Resource(s)
This issue was originally opened by @karthik-papajohns as hashicorp/terraform#18073. It was migrated here as a result of the provider split. The original body of the issue is below.
Terraform Version
Terraform Configuration Files
Debug Output
Crash Output
Expected Behavior
Expected terraform to have Google cloud data flow execution parameters(diskSizeGb, workerDiskType,workerMachineType ) configurable.
https://cloud.google.com/dataflow/pipelines/specifying-exec-params
Actual Behavior
No references of execution parameters for Google cloud dataflow are found in terraform official documentation.
https://www.terraform.io/docs/providers/google/r/dataflow_job.html
Additional Context
b/351028604