tensorflow / ecosystem

Integration of TensorFlow with other open-source frameworks
Apache License 2.0
1.37k stars 391 forks source link

Spark TensorFlow Distributor: Spark custom resource scheduling - when and how? #184

Open dgoldenberg-audiomack opened 3 years ago

dgoldenberg-audiomack commented 3 years ago

The documentation of the Spark TensorFlow Distributor says:

in order to use many features of this package, you must set up Spark custom resource scheduling for GPUs on your cluster. See the Spark docs for this.

Question 1: which "many" features? When would I need to use the custom resource scheduling vs. not?

Question 2: "See the Spark docs for this." The Spark docs are extremely tight-lipped about custom resource scheduling. For example, here: https://spark.apache.org/docs/latest/configuration.html. "spark.driver.resource.{resourceName}.amount" is supposedly an Amount of a particular resource type to use on the driver. That doesn't tell anything as to what the values may be; is it a percentage? It also wants a discovery script. What should be in it?

Can someone provide a fully working example of how to do this? Clearly, the developers of this library have gotten this to work. Please provide a fully functioning example; thanks.