tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 391 forks source link

Update the example for spark-tensorflow-distributor #165

Closed liangz1 closed 4 years ago

liangz1 commented 4 years ago

This PR fixes the data downloading issue in the example code.

Reproduce: On a cluster with multiple GPUs per worker node, with spark.resources.tasks.gpu.amount set to 1, running the original example will trigger an error related to data downloading.

Cause: There will be multiple tasks running on the same worker and each task will try to write the data to the same path, which will corrupt the data.

Fix: Randomize the file path.

googlebot commented 4 years ago

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

:memo: Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

liangz1 commented 4 years ago

@googlebot I signed it!

googlebot commented 4 years ago

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.