The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.
This is a problem because this is the place where the TF_KERAS_RUNNING_REMOTELY is injected to the entrypoint. Because this isn't set, calls to tfc.remote() in the user's script won't work in the way expected (i.e., will always return False).
Proposed Solution:
Inject this into the Dockerfile directly instead as an ENV var and pass in when building the image (in ContainerBuilder._create_docker_file).
The 'get_preprocessed_entry_point" isn't run if the distribution strategy is None and the entry point is a python file (ends in ".py"):
https://github.com/tensorflow/cloud/blob/master/src/python/tensorflow_cloud/core/run.py#L266-L282
This is a problem because this is the place where the
TF_KERAS_RUNNING_REMOTELY
is injected to the entrypoint. Because this isn't set, calls totfc.remote()
in the user's script won't work in the way expected (i.e., will always return False).Proposed Solution: Inject this into the Dockerfile directly instead as an ENV var and pass in when building the image (in ContainerBuilder._create_docker_file).