tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 391 forks source link

Update the example for spark-tensorflow-distributor #166

Closed liangz1 closed 4 years ago

liangz1 commented 4 years ago

This PR fixes the data downloading issue in the example code.

Reproduce: On a cluster with multiple GPUs per worker node, with spark.resources.tasks.gpu.amount set to 1, running the original example will trigger an error related to data downloading.

Cause: There will be multiple tasks running on the same worker and each task will try to write the data to the same path, which will corrupt the data.

Fix: Randomize the file path.