Closed bhack closed 7 years ago
@bhack you should be able to run a TensorFlow job on a DC/OS cluster running on GCE nodes, and connect to a Cloud TPU using distributed TensorFlow.
Is it really an integration? Can I put TPU in https://github.com/tensorflow/ecosystem/blob/master/marathon/template.json.jinja#L9?
@bhack The initial release will work with vanilla VMs, so a cluster manager may still be useful.
Ok I will close.this in the next days.. seems to me that there is no input by other stakeholders to improve Mesos/Dcos resources in this repository.
I played around with tfmesos
quite a bit, but never quite got it working in a way that I liked on DC/OS.
We are now working on a distributed Tensorflow framework built using the DC/OS SDK https://github.com/mesosphere/dcos-commons
Once it's ready, that will likely be the suggested way to run Tensorflow on DC/OS.
I will update this repo with instructions on how to use it once it's ready.
Ok I leave this open so we can discuss a little bit what info we want to put here when the DC/OS work will be ready.
I know this thread is closed, but I wanted to point out the new release of distributed TensorFlow on DC/OS that we announced today. https://mesosphere.com/blog/tensorflow-gpu-support-deep-learning/
@klueska so what do you plan to do with the k8 official effort? As DC/OS declared full k8 support just some weeks ago.
Why Docker is still mandatory? See https://github.com/douban/tfmesos/issues/12