The documentation of the Spark TensorFlow Distributor says:
in order to use many features of this package, you must set up Spark custom resource scheduling for GPUs on your cluster. See the Spark docs for this.
Question 1: which "many" features? When would I need to use the custom resource scheduling vs. not?
Question 2: "See the Spark docs for this." The Spark docs are extremely tight-lipped about custom resource scheduling. For example, here: https://spark.apache.org/docs/latest/configuration.html. "spark.driver.resource.{resourceName}.amount" is supposedly an Amount of a particular resource type to use on the driver. That doesn't tell anything as to what the values may be; is it a percentage? It also wants a discovery script. What should be in it?
Can someone provide a fully working example of how to do this? Clearly, the developers of this library have gotten this to work. Please provide a fully functioning example; thanks.
The documentation of the Spark TensorFlow Distributor says:
Question 1: which "many" features? When would I need to use the custom resource scheduling vs. not?
Question 2: "See the Spark docs for this." The Spark docs are extremely tight-lipped about custom resource scheduling. For example, here: https://spark.apache.org/docs/latest/configuration.html. "spark.driver.resource.{resourceName}.amount" is supposedly an
Amount of a particular resource type to use on the driver
. That doesn't tell anything as to what the values may be; is it a percentage? It also wants a discovery script. What should be in it?Can someone provide a fully working example of how to do this? Clearly, the developers of this library have gotten this to work. Please provide a fully functioning example; thanks.