lambdaji / tf_repos

TensorFlow Script
681 stars 319 forks source link

tf distribute #16

Open zhangyingxia opened 6 years ago

zhangyingxia commented 6 years ago

when i run the deepfm in the distribute mode, an error happened: No worker known as /job:chief/replica:0/task:0 could you help me~

lambdaji commented 6 years ago

run_dist.sh?

zhangyingxia commented 6 years ago

你好,我是用deepfm.py的框架,设置了分布式的TF_CONFIG, 接着启动分布式训练的时候报错:No worker known as /job:chief/replica:0/task:0。但是之前已经启动成功chief,你是否遇见过这类错误呢~

lambdaji commented 6 years ago

启动脚本发来看看