turing-machines / BMC-Firmware

Turing-pi BMC firmware
http://turingpi.com
GNU General Public License v2.0
239 stars 29 forks source link

bmcd logging to console causes strange behavior if controlling terminal vanished #185

Open j0ju opened 8 months ago

j0ju commented 8 months ago

Describe the bug If the controlling TTY of the bmcd process vanishes it behaves strangly on actions.

To Reproduce Steps to reproduce the behavior:

  1. Log into the bmc via ssh
  2. Stop bmcd via ssh
  3. Start bmcd via ssh
  4. Log out of the ssh session, ensure it is closes (No Control Master or similar features is in use)
  5. Use tpi advanced msd -n 2 NOTE: the node number does not matter. I started the process here by hand to demonstrate. The call to start-stop-daemon in /etc/init.d/S94bmcd backgrounds bmcd but keeps it attached to the local console.
    
    tpi: /media/images/rk1 # /usr/bin/bmcd --config /etc/bmcd/config.yaml & pid=$!
    2024-02-29T23:26:01.430Z INFO  [bmcd] Turing Pi 2 BMC Daemon v2.0.5

tpi: /media/images/rk1 # tpi advanced -n 2 msd 2024-02-29T23:26:23.406Z INFO [bmcd::app::bmc_application] Powering off node Node2... 2024-02-29T23:26:24.012Z INFO [bmcd::app::bmc_application] Prerequisite settings toggled, powering on... 2024-02-29T23:26:25.500Z INFO [bmcd::app::bmc_application] Checking for presence of a USB device... 2024-02-29T23:26:25.512Z INFO [bmcd::firmware_update::rockusb_fwudate] Maskrom mode detected. loading usb-plug.. ok

tpi: /media/images/rk1 # echo $pid 24004 tpi: /media/images/rk1 # ls /proc/$pid/fd/[012] -l lrwx------ 1 root root 64 Feb 29 23:28 /proc/24004/fd/0 -> /dev/pts/3 lrwx------ 1 root root 64 Feb 29 23:28 /proc/24004/fd/1 -> /dev/pts/3 lrwx------ 1 root root 64 Feb 29 23:28 /proc/24004/fd/2 -> /dev/pts/3

[ terminal closed ] [ fresh terminal ] tpi: /media/images/rk1 # pid=24004 tpi: /media/images/rk1 # echo $pid 24004 tpi: /media/images/rk1 # ls /proc/$pid/fd/[012] -l lrwx------ 1 root root 64 Feb 29 23:30 /proc/24004/fd/0 -> '/dev/pts/3 (deleted)' lrwx------ 1 root root 64 Feb 29 23:30 /proc/24004/fd/1 -> '/dev/pts/3 (deleted)' lrwx------ 1 root root 64 Feb 29 23:30 /proc/24004/fd/2 -> '/dev/pts/3 (deleted)'

tpi: /media/images/rk1 # tpi advanced -n 2 msd error sending request for url (https://127.0.0.1/api/bmc?opt=set&type=node_to_msd&node=1): connection closed before message completed

tipi > ~ > tpi advanced -n 2 normal ok

tpi: /media/images/rk1 # tpi advanced -n 2 msd error sending request for url (https://127.0.0.1/api/bmc?opt=set&type=node_to_msd&node=1): connection closed before message completed


**Expected behavior**
The operation of  `tpi advanced msd -n 2` should work as expected.

**Versions**
linux version= Linux tpi 5.4.61 #4 SMP PREEMPT Sun Jan 28 13:47:02 UTC 2024 armv7l GNU/Linux
bmc version= 2024-02-28T23:49:15.568Z INFO [bmcd] Turing Pi 2 BMC Daemon v2.0.5

**Additional context**
It seems to be related that the filedescriptors of STDOUT/ERR are closed/vanished when the pseudo terminal of the SSH session vanishes.
If the bmcd is doint output, eg. the state messages regarding the usb devices for msd mode, that it aborts the action without notice.

It would be nice if bmcd would log to syslog. So the messages can the be forwarded to a syslog server, promtail, loki, ...

Locally I fixed this by adding redirections to /dev/null and /dev/console:
start-stop-daemon --start --quiet --background --make-pidfile --pidfile "$PIDFILE" --no-close \
    --exec "$BIN" -- --config "/etc/bmcd/config.yaml" > /dev/console < /dev/null 2>&1

This circumvents the issue. 

When does this happen? everytime the bmcd dies upon error or when the OOMReaper goes around and you restart bmcd manually.
svenrademakers commented 3 months ago

having a nice syslog client inside the BMCD would be nice, Off the top of my head, I'm not sure if we have all the pieces in the firmware for this yet. However, the new firmware now logs to a file only by default.

People who want to have the old behavior back can set the log.stdout property to 'true' inside /etc/bmcd/config.yaml

@j0ju which terminal client are you using?

j0ju commented 3 months ago

iterm2, kitty, Konsole, rxvt depending on the workstation I am working. Why you are asking, the client providing the tty-interface is independent, if the ssh session dies after a restart without redirecting, STDIN/OUT/ERR are broken and the application has to deal with it. Either by closng it early or to ignore those errors.

Regarding a syslog client: in shell we have the util logger which opens and sends messages to /dev/log, which is opened by the syslog deamon. In the current 2.0.5 and later env. this is Busybox syslogd (a companied by klogd) I am not an export in the Rust ecosystem, I would guess there is a logging library available.

Also as a suggestion, I would enable the feature to log to a ring buffer in Busybox, than no file rotation is needed. If the BMC would also log that direction, everything is in one place, and for people that want remote logging to a central syslog, busybox syslogd also allows this (Although, I would use rsyslogd for this). This would need some rework, but might make the whole logging a bit more consistent.

j0ju commented 3 months ago

... set log.stdout property to 'true' inside ...

Works as expected. One wish: if the STDOUT is not a terminal, I like to have no terminal colour codes emitted.

I am redirecting to /dev/kmsg or alternativly via bmcd ... | logger -t bmcd. This does not look nice, and is without further filters not that pleasant to read.

Jul 30 00:02:26 tipi user.warn kernel: [14811.677513] ^[[32m INFO^[[0m ^[[2mbmcd^[[0m^[[2m:^[[0m Turing Pi 2 BMC Daemon v2.0.5
Jul 30 00:02:26 tipi user.warn kernel: [14811.752946] ^[[32m INFO^[[0m ^[[2mbmcd::app::bmc_application^[[0m^[[2m:^[[0m changing usb config to UsbA(Node1)
Jul 30 00:02:27 tipi user.warn kernel: [14812.175835] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 1] serial started
Jul 30 00:02:27 tipi user.warn kernel: [14812.185789] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 2] serial started
Jul 30 00:02:27 tipi user.warn kernel: [14812.195895] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 3] serial started
Jul 30 00:02:27 tipi user.warn kernel: [14812.205777] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 4] serial started
Jul 30 00:02:30 tipi kern.info kernel: [14815.426995] realtek-smi 0-005c node4: Link is Down
Jul 30 00:02:30 tipi kern.info kernel: [14815.431942] br0: port 4(node4) entered disabled state
Jul 30 00:05:33 tipi user.warn kernel: [14998.639363] ^[[31mERROR^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m error sending data to uart
Jul 30 00:05:33 tipi user.warn kernel: [14998.649303] ^[[33m WARN^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m exiting serial worker for node 1 
Jul 30 00:05:33 tipi user.warn kernel: [14998.659886] ^[[31mERROR^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m error sending data to uart
Jul 30 00:05:33 tipi user.warn kernel: [14998.669852] ^[[33m WARN^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m exiting serial worker for node 2 
Jul 30 00:05:33 tipi user.warn kernel: [14998.680454] ^[[31mERROR^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m error sending data to uart
Jul 30 00:05:33 tipi user.warn kernel: [14998.690431] ^[[33m WARN^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m exiting serial worker for node 3 
Jul 30 00:05:33 tipi user.warn kernel: [14998.701070] ^[[31mERROR^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m error sending data to uart
Jul 30 00:05:33 tipi user.warn kernel: [14998.711089] ^[[33m WARN^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m exiting serial worker for node 4 
Jul 30 00:05:33 tipi user.warn kernel: [14998.722079] ^[[32m INFO^[[0m ^[[2mbmcd^[[0m^[[2m:^[[0m exiting bmcd
Jul 30 00:05:33 tipi user.warn kernel: [14998.860703] ^[[32m INFO^[[0m ^[[2mbmcd::persistency::app_persistency^[[0m^[[2m:^[[0m persistency synced
Jul 30 00:05:34 tipi user.notice bmcd: Starting bmcd...
Jul 30 00:05:34 tipi user.warn kernel: [14999.435392] ^[[32m INFO^[[0m ^[[2mbmcd^[[0m^[[2m:^[[0m Turing Pi 2 BMC Daemon v2.0.5
Jul 30 00:05:34 tipi user.warn kernel: [14999.474276] ^[[32m INFO^[[0m ^[[2mbmcd::app::bmc_application^[[0m^[[2m:^[[0m changing usb config to UsbA(Node1)
Jul 30 00:05:34 tipi user.warn kernel: [14999.891398] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 1] serial started
Jul 30 00:05:34 tipi user.warn kernel: [14999.901260] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 2] serial started
Jul 30 00:05:34 tipi user.warn kernel: [14999.911010] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 3] serial started
Jul 30 00:05:34 tipi user.warn kernel: [14999.920767] ^[[32m INFO^[[0m ^[[2mbmcd::serial_service::serial_handler^[[0m^[[2m:^[[0m [node 4] serial started
svenrademakers commented 3 months ago

There are indeed some libs that can connect to a syslogd. Not much thought is put into designing a good logging system, but I'm planning to start this soon as part of #153.

however, I've added a config option as well to disable the coloring for now (on default disabled)