Open tslam75 opened 7 years ago
Thanks! That'd be really useful.
@tslam75 Have you looked at https://issues.apache.org/jira/browse/YARN-6043?
@zhe-thoughts Thanks for the reference! Looked over YARN-6043, and both uses a native application master for TensorFlow.
Attaching a design document here now. We also have an implementation based on this design, and will publish the code soon.
Sorry for the delay.
Created pull request #39 while waiting for the CLA to be signed.
Awesome job :O
In Hadoop 3.0, YARN native services can support running Tensorflow services on YARN without adding any dependencies or implement a new YARN application master.
Please see our blogpost: https://hortonworks.com/blog/distributed-tensorflow-assembly-hadoop-yarn/ and let me know if you have any questions. Thanks!
focus ...
@tslam75 Does your 'TensorFlow on Yarn' support fault tolerance ? If yes, how?
mark
+1
We (LinkedIn Hadoop team) just open sourced TonY: Repo: https://github.com/linkedin/TonY Blog post: https://engineering.linkedin.com/blog/2018/09/open-sourcing-tony--native-support-of-tensorflow-on-hadoop
Comments / discussions very welcome!
Hadoop YARN is a commonly deployed cluster manager. Having the ability to run TensorFlow on YARN would be very useful in such environment.
Our team is currently working on a YARN application for this purpose, and would like to contribute our work here. We will provide more details of our contribution soon.
-Jason