Open jiyongjung0 opened 4 years ago
@jiyongjung0 - what's the reason for requiring the public IP for each worker?
@paveldournov IIRC the reason is that Dataflow workers need to install external dependencies on-the-fly, which needs public IP.
Once Dataflow supports container image as runtime environment this can be resolved, I think.
It is not that Dataflow workers need public IPs, but it is that Dataflow workers need outbound internet access so that dependency packages can be retrieved on-the-fly.
For this reason, Private IP + NAT could also work.
Example and template pipelines are suggesting use of
--experiments=shuffle_mode=auto
flag to change worker machine type to mitigate a quota issue.There is a quota limit in the use of external(static) IP address in GCP. Our example pipeline requires an external IP address for each Dataflow worker which might lead to exhaustion of available IP addresses. Using a bigger machine type can lead to using smaller number of workers which will be helpful in this situation.