tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.org/tfx
Apache License 2.0
2.09k stars 694 forks source link

Remove `machine_type=n1-standard-8` flag work-around in Dataflow Beam pipeline argument #1823

Open jiyongjung0 opened 4 years ago

jiyongjung0 commented 4 years ago

Example and template pipelines are suggesting use of --experiments=shuffle_mode=auto flag to change worker machine type to mitigate a quota issue.

There is a quota limit in the use of external(static) IP address in GCP. Our example pipeline requires an external IP address for each Dataflow worker which might lead to exhaustion of available IP addresses. Using a bigger machine type can lead to using smaller number of workers which will be helpful in this situation.

paveldournov commented 4 years ago

@jiyongjung0 - what's the reason for requiring the public IP for each worker?

numerology commented 4 years ago

@paveldournov IIRC the reason is that Dataflow workers need to install external dependencies on-the-fly, which needs public IP.

Once Dataflow supports container image as runtime environment this can be resolved, I think.

ucdmkt commented 4 years ago

It is not that Dataflow workers need public IPs, but it is that Dataflow workers need outbound internet access so that dependency packages can be retrieved on-the-fly.

For this reason, Private IP + NAT could also work.