Closed Told closed 6 years ago
I will recommend you touch base with http://stackoverflow.com/questions/tagged/tensorflow , this is a great starting point to get going with debugging this issue -- feel free to post here if you discover a specific problem.
Please go to Stack Overflow for help and support:
http://stackoverflow.com/questions/tagged/tensorflow
Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:
Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.
System information
You can collect some of this information using our environment capture script:
https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh
You can obtain the TensorFlow version with
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
Describe the problem
Based on cifar10_train.py and tensorflow distributed doc, I writed a cifar10_train_distributed.py in distributed version and ran on a k8s with 2 worker and 1 ps. But slower speed displayed. cifar10_train in 1 GPU (Tesla p 100), gained 0.008 sec/batch. cifar10_train_distributed gained 0.027 sec/batch. here is my code . I do not understand!!! Help!
Source code / logs
def train(): print("here") tf_config_json = FLAGS.tf_config tf_config = json.loads(tf_config_json)
get cluster info and build spec object that used to init each node