Thank your for providing the TensorFlow examples. I have two questions.
Is there a way to setup the node count for PS and Workers independently?
I am trying to use CPU based training so I don' want the Parameter Servers to share the resources of a Worker. Currently, the way Batch AI is assigning PS and Workers is basically with the workerCount parameter in the job.json file. It puts :2223 as the port for PS and :2222 for port as workers but the node is actually shared. Is there a way to decouple this?
Are the GPU recipes portable for CPU based runs or do we need to modify the code? I find the only change to be specifying the tensorflow image in job.json and things seem to work fine.
Thank your for providing the TensorFlow examples. I have two questions.
I am trying to use CPU based training so I don' want the Parameter Servers to share the resources of a Worker. Currently, the way Batch AI is assigning PS and Workers is basically with the workerCount parameter in the job.json file. It puts :2223 as the port for PS and :2222 for port as workers but the node is actually shared. Is there a way to decouple this?