logicalclocks / maggy

Distribution transparent Machine Learning experiments on Apache Spark
https://maggy.ai
Apache License 2.0
90 stars 14 forks source link

Torch distributed support #84

Closed amacati closed 3 years ago

moritzmeister commented 3 years ago

closes #77, closes #78

amacati commented 3 years ago

Fixed the port assignment to be handled by the OS and the heartbeat issue. Also added a notebook in the examples folder so that you can easily check the code. Note that the code I inserted into the distributed driver is only present to make the heartbeat work, the mechanism will hopefully change during the next PR requests. Can you recheck the PR @RiccardoGrigoletto @moritzmeister ?