simonsobs / socs

Simons Observatory specific OCS agents.
BSD 2-Clause "Simplified" License
12 stars 12 forks source link

Agents running in docker not connecting to crossbar in host network mode #697

Closed kellyhd closed 3 months ago

kellyhd commented 3 months ago

Our network is set up with a wifi card on the host machine which connects to the internet, and a second ethernet network interface (eno1) that connects to a network switch, which then connects to all the devices. From the host, we can communicate with the devices and run the agents locally. We then set up a bridge network (ocs-net) based on the instructions in the documentation. However, when running the containers on the bridge network, they can only connect to the wifi card network. We confirmed this by pinging google in the container and receiving packets. However, pinging anything on the local network (eno1) does not return any response. When we run our agents we are able to connect to the client, but when we try to initialize a socket connection, it times out because it cannot find the devices connected to eno1.

One thing we found in the docker networking documentation is that

A container only sees a network interface with an IP address, a gateway, a routing table, DNS services, and other networking details.

and eno1 does not have all of these (routing table, DNS service, etc) so we think this might be part of why the containers don't recognize it and can't connect to it.

We then tried to run the docker images in host network mode (because we are trying to connect to hardware devices, and the documentation mentions we need to be in host network mode):

Containers that require communication with networked devices not running Docker (i.e. networked hardware devices such as the Lakeshore 372) will still need to be in the “host” network mode.

However, when I run the agents (as well as crossbar) on the host network, the agents still can't connect with crossbar, and it's giving me this:

ControlClientError: [0, 0, 0, 0, 'wamp.error.no_such_procedure', ['no callee registered for procedure <observatory.signalgenerator1>'], {}]

The logs when I try to run the agent show this: 2024-06-26T15:55:28+0000 Scheduling retry 1 to connect <twisted.internet.endpoints.TCP4ClientEndpoint object at 0x7f31c53dd970> in 2.0425696452288116 seconds.

Which seems to indicate that we cannot connect to the crossbar when running in host network mode. We setup the crossbar docker following the instructions in the documentation (with the crossbar started via its own docker-compose file separate from the rest of the agents). I noticed that on the documentation, there was a note that when you put containers in host network mode:

"Keep in mind that this disables the convenient name resolution provided by the Docker network, so your container will likely require additional configuration, particularly if it needs to also communicate with the crossbar server"

but then it never gives more information about how to configure so the containers can communicate with crossbar. How can I fix this issue?