openbmc / obmc-console

OpenBMC host console infrastructure
Apache License 2.0
17 stars 25 forks source link

obmc-console-ssh.socket fails to start #7

Closed legoater closed 7 years ago

legoater commented 8 years ago
systemd[1]: Stopping Serial Getty on ttyS0...
obmc-console-server[678]: obmc-console-server: Error reading from tty device: Success
obmc-console-server[678]: 3 handlers
obmc-console-server[678]:   log [active]
obmc-console-server[678]:   socket [active]
obmc-console-server[678]:   tty [inactive]
systemd[1]: Stopped Serial Getty on ttyS0.
systemd[1]: [[0;1;39mobmc-console.service: Main process exited, code=exited, status=1/FAILURE[[0m
systemd[1]: [[0;1;39mobmc-console.service: Unit entered failed state.[[0m
systemd[1]: [[0;1;39mobmc-console.service: Failed with result 'exit-code'.[[0m
systemd[1]: obmc-console.service: Service hold-off time over, scheduling restart.
systemd[1]: Stopped OpenBMC console daemon.
systemd[1]: Started OpenBMC console daemon.
systemd[1]: Closed OpenBMC console ssh server socket.
systemd[1]: Stopping OpenBMC console ssh server socket.
systemd[1]: [[0;1;39mobmc-console-ssh.socket: Failed to listen on sockets: Address already in use[[0m
systemd[1]: [[0;1;31mFailed to listen on OpenBMC console ssh server socket.[[0m
systemd[1]: [[0;1;39mobmc-console-ssh.socket: Unit entered failed state.[[0m
system_manager.py[683]: /usr/sbin/startup_hacks.sh: line 9: /sys/devices/platform/ahb/ahb:apb/1e787000.vuart/e
nabled: Permission denied
amboar commented 8 years ago

Related: openbmc/openbmc#585

legoater commented 8 years ago

all works fine after a restart. Is there an issue in the systemd sequence ?

mdmillerii commented 8 years ago

I would guess missing SO_REUSEADDR setsockopt and lingering connections. netstat -a would be useful to partially confirm.

williamspatrick commented 8 years ago

I suspect there are two problems here:

  1. Since we are using systemd to start both obmc-console-server and obmc-console-client, they end up starting at effectively the same time. The obmc-console-server takes long enough to start up that it's UNIX socket is not ready for the client to connect to by the time obmc-console-client tries to get to it. Thus obmc-console-client exits due to the socket not existing.
  2. When we subsequently restart obmc-console-client the socket 2200 is now already in use and we further exit.

The first issue might be solved by systemd socket-activation. The second issue could be solved by either SO_REUSEADDR, as Milton mentioned, or maybe (also?) by reordering the bind(2200) call to be after the connection to obmc-console-client is done.

geissonator commented 8 years ago

Same issue or something different? Got this after a power cycle to my system (was seeing other boot issues prior to the power cycle). BarPVT

Sep 20 12:35:35 barreleye systemd[1]: dev-ttyVUART0.device: Job dev-ttyVUART0.device/start timed out.
Sep 20 12:35:35 barreleye systemd[1]: Timed out waiting for device dev-ttyVUART0.device.
Sep 20 12:35:35 barreleye systemd[1]: Dependency failed for Phosphor Console Muxer.
Sep 20 12:35:35 barreleye systemd[1]: obmc-console.service: Job obmc-console.service/start failed with result 'dependency'.
Sep 20 12:35:35 barreleye systemd[1]: dev-ttyVUART0.device: Job dev-ttyVUART0.device/start failed with result 'timeout'.

Got our most recent master on there (as of today - date on BMC is wrong) VERSION_ID="v1.99.0-131-ga81f31c-dirty"

adamliyi commented 7 years ago

The only way I can reproduce this error (on Palmetto) is doing:

systemctl kill obmc-console@ttyVUART0
Oct 25 06:48:34 palmetto systemd[1]: Closed Phosphor Host Console SSH Per-Connection socket.
Oct 25 06:48:34 palmetto systemd[1]: Stopping Phosphor Host Console SSH Per-Connection socket.
Oct 25 06:48:34 palmetto systemd[1]: obmc-console-ssh.socket: Failed to listen on sockets: Address already in use
Oct 25 06:48:34 palmetto systemd[1]: Failed to listen on Phosphor Host Console SSH Per-Connection socket.
Oct 25 06:48:34 palmetto systemd[1]: obmc-console-ssh.socket: Unit entered failed state.

It looks to me, obmc-console-ssh.socket does not need to have Requires dependency with obmc-console@.service,

[Unit]
Description=Phosphor Host Console SSH Per-Connection socket
Conflicts=obmc-console-ssh.service
Requires=obmc-console.service

The dependency should looks like:

obmc-console-ssh.socket (implicit dependency) ---> obmc-console-ssh@.service (Wants) ---> obmc-console@.service

If I remove the Requires directive, or change Requires to Wants, restarting obmc-console server will not restarts obmc-console-ssh.socket, and there will no Failed to listen on sockets: Address already in use error.

adamliyi commented 7 years ago

committed a fix here: https://gerrit.openbmc-project.xyz/#/c/893/