tensorflow / cloud

The TensorFlow Cloud repository provides APIs that will allow to easily go from debugging and training your Keras and TensorFlow code in a local environment to distributed training in the cloud.
https://github.com/tensorflow/cloud
Apache License 2.0
370 stars 84 forks source link

tfc.remote() == False if python file entrypoint and distribution_strategy is None #391

Open tc-wolf opened 1 year ago

tc-wolf commented 1 year ago

The 'get_preprocessed_entry_point" isn't run if the distribution strategy is None and the entry point is a python file (ends in ".py"):

https://github.com/tensorflow/cloud/blob/master/src/python/tensorflow_cloud/core/run.py#L266-L282

This is a problem because this is the place where the TF_KERAS_RUNNING_REMOTELY is injected to the entrypoint. Because this isn't set, calls to tfc.remote() in the user's script won't work in the way expected (i.e., will always return False).

Proposed Solution: Inject this into the Dockerfile directly instead as an ENV var and pass in when building the image (in ContainerBuilder._create_docker_file).