Closed shrivaths16 closed 5 months ago
The ZMQConfig
is used here to setup the TrainingControllerZMQ
and ProgressReporterZMQ
:
Which is called from setup_output_callbacks
here:
Which is called from Trainer._setup_outputs
here:
It uses the TrainingJobConfig
to specify the ZMQ address/port, which derives from the loaded config file.
When calling from the CLI, we do already overwrite some parts of the config with the CLI provided options, for example, to enable/disable ZMQ entirely:
Next to this block, we should also support specifying the ZMQ port explicitly and overwriting the appropriate config fields:
job_config.outputs.zmq.controller_address
job_config.outputs.zmq.publish_address
Another nice option could be to try to automatically detect a free port using Socket.bind_to_random_port()
.
We still need to know what the port is in order to pass it to the backend, so just calling this by itself wouldn't work, but we could use it to write a utility function to discover a free port, e.g.:
def find_free_port():
ctx = zmq.Context.instance()
socket = ctx.socket()
port = socket.bind_to_random_port("tcp://127.0.0.1")
socket.disconnect()
return port
[WIP]
As of now we do not have an option to choose the ZMQ ports via the GUI and it has been hardcoded to
tcp://127.0.0.1:9000
for the controller address andtcp://127.0.0.1:9001
for the publish address. Sometimes there is an issue when there are multiple SLEAP applications that are open and trained with, leading to "ZMQError: Address already in use" as mentioned in discussion #1751.In order to solve this, we need to make some changes as listed below: