tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 392 forks source link

Marathon/Mesos and docker #25

Closed bhack closed 7 years ago

bhack commented 7 years ago

Why Docker is still mandatory? See https://github.com/douban/tfmesos/issues/12

saeta commented 7 years ago

@bhack you should be able to run a TensorFlow job on a DC/OS cluster running on GCE nodes, and connect to a Cloud TPU using distributed TensorFlow.

bhack commented 7 years ago

Is it really an integration? Can I put TPU in https://github.com/tensorflow/ecosystem/blob/master/marathon/template.json.jinja#L9?

jhseu commented 7 years ago

@bhack The initial release will work with vanilla VMs, so a cluster manager may still be useful.

bhack commented 7 years ago

Ok I will close.this in the next days.. seems to me that there is no input by other stakeholders to improve Mesos/Dcos resources in this repository.

klueska commented 7 years ago

I played around with tfmesos quite a bit, but never quite got it working in a way that I liked on DC/OS.

We are now working on a distributed Tensorflow framework built using the DC/OS SDK https://github.com/mesosphere/dcos-commons

Once it's ready, that will likely be the suggested way to run Tensorflow on DC/OS.

I will update this repo with instructions on how to use it once it's ready.

bhack commented 7 years ago

Ok I leave this open so we can discuss a little bit what info we want to put here when the DC/OS work will be ready.

klueska commented 6 years ago

I know this thread is closed, but I wanted to point out the new release of distributed TensorFlow on DC/OS that we announced today. https://mesosphere.com/blog/tensorflow-gpu-support-deep-learning/

bhack commented 6 years ago

@klueska so what do you plan to do with the k8 official effort? As DC/OS declared full k8 support just some weeks ago.