Closed jacob1017 closed 7 years ago
@jhseu maybe there is an ecosystem answer?
@rhaertel80 for CloudML.
We consider job setup not to be a responsibility of TensorFlow core. It's more suited to other things, like Kubernetes, Mesos, or Spark.
Take a look at TensorFlow on Spark if you already have Spark cluster: https://github.com/yahoo/TensorFlowOnSpark
Or if you're running Kubernetes, you can use a configuration from https://github.com/tensorflow/ecosystem
OK,this sounds reasonable, thanks @jhseu
If you're open to using cloud technologies, Google Cloud Machine Learning Engine is a managed service that automatically brings up and takes down nodes for running your TensorFlow jobs. For an example of how simple it can be to launch your distributed TensorFlow job, see the quickstart.
If we want deployed tensorflow on our cluster, it is really inefficient in my opinion. As the official tutorial shows, how many task you have launched, then how many times you should run you program file on those nodes.
As our developers hope, tensorflow will be our Hadoop in Deep Learning. Hadoop to launch an job would be more convenient just execute once your job command.
Maybe I doesn't use this framework correctly, if you have any good ideas for this, we can discussed an nice solution and make our world beautiful.