ZMQ port hardcoded and not editable via the GUI

shrivaths16 commented 6 months ago

[WIP]

As of now we do not have an option to choose the ZMQ ports via the GUI and it has been hardcoded to tcp://127.0.0.1:9000 for the controller address and tcp://127.0.0.1:9001 for the publish address. Sometimes there is an issue when there are multiple SLEAP applications that are open and trained with, leading to "ZMQError: Address already in use" as mentioned in discussion #1751.

In order to solve this, we need to make some changes as listed below:

[ ] Update frontend loss viewer GUI which has the ports hardcoded here
[ ] Update controller_address and publish_address in the ZMQ section of the training job config via the training editor GUI (just ask users to specify ports and assume that the base of the address is always tcp://127.0.0.1)

talmo commented 6 months ago

The ZMQConfig is used here to setup the TrainingControllerZMQ and ProgressReporterZMQ:

https://github.com/talmolab/sleap/blob/18aad91d10508127cc4f8a398432c43e661546a9/sleap/nn/training.py#L396-L412

Which is called from setup_output_callbacks here:

https://github.com/talmolab/sleap/blob/18aad91d10508127cc4f8a398432c43e661546a9/sleap/nn/training.py#L497

Which is called from Trainer._setup_outputs here:

https://github.com/talmolab/sleap/blob/18aad91d10508127cc4f8a398432c43e661546a9/sleap/nn/training.py#L851-L853

It uses the TrainingJobConfig to specify the ZMQ address/port, which derives from the loaded config file.

When calling from the CLI, we do already overwrite some parts of the config with the CLI provided options, for example, to enable/disable ZMQ entirely:

https://github.com/talmolab/sleap/blob/18aad91d10508127cc4f8a398432c43e661546a9/sleap/nn/training.py#L1924-L1928

Next to this block, we should also support specifying the ZMQ port explicitly and overwriting the appropriate config fields:

job_config.outputs.zmq.controller_address
job_config.outputs.zmq.publish_address

talmo commented 6 months ago

Another nice option could be to try to automatically detect a free port using Socket.bind_to_random_port().

We still need to know what the port is in order to pass it to the backend, so just calling this by itself wouldn't work, but we could use it to write a utility function to discover a free port, e.g.:


def find_free_port():
    ctx = zmq.Context.instance()
    socket = ctx.socket()
    port = socket.bind_to_random_port("tcp://127.0.0.1")
    socket.disconnect()
    return port

talmolab / sleap

ZMQ port hardcoded and not editable via the GUI #1774