robotology / icub-tech-support

Virtual repository that provides support requests for individual robots
GNU General Public License v2.0
20 stars 2 forks source link

iCubGenova09's icub-head refuse ssh connection for two minutes when starting #1172

Closed traversaro closed 3 years ago

traversaro commented 3 years ago

Device name 🤖

iCubGenova09

Request/Failure description

I recently update the iCubGenova09 to iCubOS 8.0 (based on Ubuntu 20.04) following the docs in https://icub-tech-iit.github.io/documentation/icub_operating_systems/icubos/installation-from-image/ with some integration that will be documented in a separate issue.

The machine worked fine, but after some reboots and modifications to install the software, after a reboot now the machine can be pinged, but refuses any ssh connection with error:

icub@iiticublap158:~ $ ssh icub-head
ssh: connect to host icub-head port 22: Connection refused

for approximately two minutes. After two minutes, it is possible to connect to it normally.

Detailed context

I tried to connect a screen to the icub-head during the period in which ssh is rejecting connections, and the boot sequence is hanging on the message "A start job is running for Wait for network to be Configured" :
startjob_is_waiting

Additional context

No response

traversaro commented 3 years ago

As this is more an IT problem and @mbrunettini and @davidelasagna already helped me on this, I think it is safe to directly assign them, but let me know if you have some other idea @Uboldi80 @Fabrizio69 .

davidelasagna commented 3 years ago

@traversaro I think I know the issue, it is caused by the wifi interface, the system tries to bring it up at boot but since there is no wifi active with the configured SSID it can't connect to it so it waits till a certain timeout. The solution is just to add optional: true in the netplan configuration file .yaml as it follows below:

wifis:
    wlp2s0:
      dhcp4: no
      dhcp6: no
      access-points:
        "please-put-your-ESSID-here":
          password: "please-put-your-wifi-password-here"
      optional: true

This sound strange to me because all this fix were already implemented in the last version we release. Maybe the link from the documentation still points to an older version, I'll check it out tomorrow to be sure. Thanks all!

traversaro commented 3 years ago

The optional: true seems already to be there:

icub@icub-head:~$ cat /etc/netplan/50-icub.yaml 
network:
  version: 2
  ethernets:
    enp4s0:
      dhcp4: no
      dhcp6: no
      optional: true
    eno1:
      dhcp4: no
      dhcp6: no
      addresses: [10.0.1.104/24]
      optional: true
  wifis:
    wlp2s0:
      dhcp4: no
      dhcp6: no
      access-points:
        "iCub3_000-Wifi":
          password: "iCubWifi"
      optional: true
  bonds:
    main-bond:
      addresses: [10.0.0.2/24]
      gateway4: 10.0.0.1
      nameservers:
        search: [icub.local]
        addresses: 
          - 10.0.0.1
      interfaces:
        - enp4s0
        - wlp2s0
      parameters:
        mode: active-backup
        primary: enp4s0
        mii-monitor-interval: 1
        gratuitious-arp: 5

By the way, I checked against a backup of the same file in Ubuntu 18.04, and this file is exactly the same.

traversaro commented 3 years ago

Based on https://askubuntu.com/questions/972215/a-start-job-is-running-for-wait-for-network-to-be-configured-ubuntu-server-17-1 I tried to add also link-local: [ ], but according to my limited test this does not solve the problem.

traversaro commented 3 years ago

By running journalctl --since "8min ago" right after a reboot I was able to get a more detailed log:

-- Logs begin at Wed 2021-07-21 19:00:21 UTC, end at Thu 2021-08-05 09:57:45 UTC. --
Aug 05 09:55:29 icub-head systemd-timesyncd[642]: Initial synchronization to time server 91.189.91.157:123 (ntp.ubuntu.com).
Aug 05 09:55:30 icub-head systemd-networkd[427]: main-bond: Gained IPv6LL
Aug 05 09:55:30 icub-head avahi-daemon[670]: Joining mDNS multicast group on interface main-bond.IPv6 with address fe80::3800:3aff:fea1:e4c6.
Aug 05 09:55:30 icub-head avahi-daemon[670]: New relevant interface main-bond.IPv6 for mDNS.
Aug 05 09:55:30 icub-head avahi-daemon[670]: Registering new address record for fe80::3800:3aff:fea1:e4c6 on main-bond.*.
Aug 05 09:55:30 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:30 icub-head systemd[1]: systemd-rfkill.service: Succeeded.
Aug 05 09:55:32 icub-head set-cpufreq[681]: Setting powersave scheduler for all CPUs
Aug 05 09:55:32 icub-head systemd[1]: ondemand.service: Succeeded.
Aug 05 09:55:35 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:36 icub-head NetworkManager[673]: <warn>  [1628157336.9848] device (wlp2s0): re-acquiring supplicant interface (#1).
Aug 05 09:55:36 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:36 icub-head systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Aug 05 09:55:37 icub-head wpa_supplicant[692]: ctrl_iface exists and seems to be in use - cannot override it
Aug 05 09:55:37 icub-head wpa_supplicant[692]: Delete '/run/wpa_supplicant/wlp2s0' manually if it is not used anymore
Aug 05 09:55:37 icub-head wpa_supplicant[692]: Failed to initialize control interface '/run/wpa_supplicant'.
                                               You may have another wpa_supplicant process already running or the file was
                                               left by an unclean termination of wpa_supplicant in which case you will need
                                               to manually remove this file before starting wpa_supplicant again.
Aug 05 09:55:37 icub-head wpa_supplicant[692]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Aug 05 09:55:37 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:37 icub-head NetworkManager[673]: <error> [1628157337.0235] sup-iface[0x55e6857e69f0,wlp2s0]: error adding interface: wpa_supplicant couldn't grab this interface.
Aug 05 09:55:37 icub-head NetworkManager[673]: <info>  [1628157337.0236] device (wlp2s0): supplicant interface state: starting -> down
Aug 05 09:55:38 icub-head systemd[1]: dmesg.service: Succeeded.
Aug 05 09:55:42 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:46 icub-head NetworkManager[673]: <warn>  [1628157346.9877] device (wlp2s0): re-acquiring supplicant interface (#2).
Aug 05 09:55:46 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:47 icub-head wpa_supplicant[692]: ctrl_iface exists and seems to be in use - cannot override it
Aug 05 09:55:47 icub-head wpa_supplicant[692]: Delete '/run/wpa_supplicant/wlp2s0' manually if it is not used anymore
Aug 05 09:55:47 icub-head wpa_supplicant[692]: Failed to initialize control interface '/run/wpa_supplicant'.
                                               You may have another wpa_supplicant process already running or the file was
                                               left by an unclean termination of wpa_supplicant in which case you will need
                                               to manually remove this file before starting wpa_supplicant again.
Aug 05 09:55:47 icub-head wpa_supplicant[692]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Aug 05 09:55:47 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:47 icub-head NetworkManager[673]: <error> [1628157347.0468] sup-iface[0x55e6857e6ae0,wlp2s0]: error adding interface: wpa_supplicant couldn't grab this interface.
Aug 05 09:55:47 icub-head NetworkManager[673]: <info>  [1628157347.0469] device (wlp2s0): supplicant interface state: starting -> down
Aug 05 09:55:56 icub-head systemd[1]: systemd-hostnamed.service: Succeeded.
Aug 05 09:55:56 icub-head NetworkManager[673]: <warn>  [1628157356.9840] device (wlp2s0): re-acquiring supplicant interface (#3).
Aug 05 09:55:56 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:57 icub-head wpa_supplicant[692]: ctrl_iface exists and seems to be in use - cannot override it
Aug 05 09:55:57 icub-head wpa_supplicant[692]: Delete '/run/wpa_supplicant/wlp2s0' manually if it is not used anymore
Aug 05 09:55:57 icub-head wpa_supplicant[692]: Failed to initialize control interface '/run/wpa_supplicant'.
                                               You may have another wpa_supplicant process already running or the file was
                                               left by an unclean termination of wpa_supplicant in which case you will need
                                               to manually remove this file before starting wpa_supplicant again.
Aug 05 09:55:57 icub-head wpa_supplicant[692]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Aug 05 09:55:57 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:55:57 icub-head NetworkManager[673]: <error> [1628157357.0308] sup-iface[0x55e6857e6bd0,wlp2s0]: error adding interface: wpa_supplicant couldn't grab this interface.
Aug 05 09:55:57 icub-head NetworkManager[673]: <info>  [1628157357.0309] device (wlp2s0): supplicant interface state: starting -> down
Aug 05 09:56:06 icub-head NetworkManager[673]: <warn>  [1628157366.9858] device (wlp2s0): re-acquiring supplicant interface (#4).
Aug 05 09:56:06 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:56:07 icub-head wpa_supplicant[692]: ctrl_iface exists and seems to be in use - cannot override it
Aug 05 09:56:07 icub-head wpa_supplicant[692]: Delete '/run/wpa_supplicant/wlp2s0' manually if it is not used anymore
Aug 05 09:56:07 icub-head wpa_supplicant[692]: Failed to initialize control interface '/run/wpa_supplicant'.
                                               You may have another wpa_supplicant process already running or the file was
                                               left by an unclean termination of wpa_supplicant in which case you will need
                                               to manually remove this file before starting wpa_supplicant again.
Aug 05 09:56:07 icub-head wpa_supplicant[692]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Aug 05 09:56:07 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:56:07 icub-head NetworkManager[673]: <error> [1628157367.0378] sup-iface[0x55e6857e6cc0,wlp2s0]: error adding interface: wpa_supplicant couldn't grab this interface.
Aug 05 09:56:07 icub-head NetworkManager[673]: <info>  [1628157367.0379] device (wlp2s0): supplicant interface state: starting -> down
Aug 05 09:56:16 icub-head NetworkManager[673]: <warn>  [1628157376.9942] device (wlp2s0): re-acquiring supplicant interface (#5).
Aug 05 09:56:16 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:56:17 icub-head wpa_supplicant[692]: ctrl_iface exists and seems to be in use - cannot override it
Aug 05 09:56:17 icub-head wpa_supplicant[692]: Delete '/run/wpa_supplicant/wlp2s0' manually if it is not used anymore
Aug 05 09:56:17 icub-head wpa_supplicant[692]: Failed to initialize control interface '/run/wpa_supplicant'.
                                               You may have another wpa_supplicant process already running or the file was
                                               left by an unclean termination of wpa_supplicant in which case you will need
                                               to manually remove this file before starting wpa_supplicant again.
Aug 05 09:56:17 icub-head wpa_supplicant[692]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Aug 05 09:56:17 icub-head systemd-networkd-wait-online[450]: managing: main-bond
Aug 05 09:56:17 icub-head NetworkManager[673]: <error> [1628157377.0529] sup-iface[0x55e6857e6cc0,wlp2s0]: error adding interface: wpa_supplicant couldn't grab this interface.
Aug 05 09:56:17 icub-head NetworkManager[673]: <info>  [1628157377.0530] device (wlp2s0): supplicant interface state: starting -> down
Aug 05 09:56:17 icub-head NetworkManager[673]: <info>  [1628157377.0531] device (wlp2s0): supplicant interface keeps failing, giving up
Aug 05 09:57:24 icub-head systemd-networkd-wait-online[450]: Event loop failed: Connection timed out
Aug 05 09:57:24 icub-head systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Aug 05 09:57:24 icub-head systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Aug 05 09:57:24 icub-head systemd[1]: Failed to start Wait for Network to be Configured.
Aug 05 09:57:24 icub-head systemd[1]: Reached target Network is Online.
Aug 05 09:57:24 icub-head systemd[1]: Condition check resulted in Login to default iSCSI targets being skipped.
Aug 05 09:57:24 icub-head systemd[1]: Reached target Remote File Systems (Pre).
Aug 05 09:57:24 icub-head systemd[1]: Reached target Remote File Systems.
Aug 05 09:57:24 icub-head systemd[1]: Starting LSB: automatic crash report generation...
Aug 05 09:57:24 icub-head systemd[1]: Starting Deferred execution scheduler...
Aug 05 09:57:24 icub-head systemd[1]: Starting Availability of block devices...
Aug 05 09:57:24 icub-head systemd[1]: Started Regular background program processing daemon.
Aug 05 09:57:24 icub-head systemd[1]: Condition check resulted in Pollinate to seed the pseudo random number generator being skipped.
Aug 05 09:57:24 icub-head systemd[1]: Starting OpenBSD Secure Shell server...
Aug 05 09:57:24 icub-head cron[931]: (CRON) INFO (pidfile fd = 3)
Aug 05 09:57:24 icub-head cron[931]: (CRON) INFO (Running @reboot jobs)
Aug 05 09:57:24 icub-head systemd[1]: Starting Permit User Sessions...
Aug 05 09:57:24 icub-head systemd[1]: Started Deferred execution scheduler.
Aug 05 09:57:24 icub-head systemd[1]: Finished Availability of block devices.
Aug 05 09:57:24 icub-head systemd[1]: Finished Permit User Sessions.
Aug 05 09:57:24 icub-head systemd[1]: Starting Hold until boot process finishes up...
Aug 05 09:57:24 icub-head systemd[1]: Starting Terminate Plymouth Boot Screen...
Aug 05 09:57:24 icub-head systemd[1]: plymouth-quit-wait.service: Succeeded.
Aug 05 09:57:24 icub-head systemd[1]: Finished Hold until boot process finishes up.
Aug 05 09:57:24 icub-head systemd[1]: Starting Set console scheme...
Aug 05 09:57:24 icub-head apport[916]:  * Starting automatic crash report generation: apport
Aug 05 09:57:25 icub-head apport[916]:    ...done.
Aug 05 09:57:24 icub-head systemd[1]: plymouth-quit.service: Succeeded.
Aug 05 09:57:24 icub-head systemd[1]: Finished Terminate Plymouth Boot Screen.
Aug 05 09:57:25 icub-head systemd[1]: Finished Set console scheme.
Aug 05 09:57:25 icub-head systemd[1]: Started LSB: automatic crash report generation.
Aug 05 09:57:25 icub-head sshd[950]: Server listening on 0.0.0.0 port 22.
Aug 05 09:57:25 icub-head systemd[1]: Created slice system-getty.slice.
Aug 05 09:57:25 icub-head sshd[950]: Server listening on :: port 22.
Aug 05 09:57:25 icub-head systemd[1]: Started Getty on tty1.
Aug 05 09:57:25 icub-head systemd[1]: Reached target Login Prompts.
Aug 05 09:57:25 icub-head systemd[1]: Started OpenBSD Secure Shell server.
Aug 05 09:57:25 icub-head systemd[1]: Reached target Multi-User System.
Aug 05 09:57:25 icub-head systemd[1]: Starting Update UTMP about System Runlevel Changes...
Aug 05 09:57:25 icub-head systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Aug 05 09:57:25 icub-head systemd[1]: Finished Update UTMP about System Runlevel Changes.
Aug 05 09:57:25 icub-head systemd[1]: Startup finished in 5.406s (firmware) + 4.811s (loader) + 4.099s (kernel) + 2min 941ms (userspace) = 2min 15.257s.
Aug 05 09:57:45 icub-head sshd[961]: Accepted publickey for icub from 10.0.0.158 port 44452 ssh2: RSA SHA256:244DnHrjv5ANMkIEWyHNzIadPfraaf6wbGMyBWozskQ
Aug 05 09:57:45 icub-head sshd[961]: pam_unix(sshd:session): session opened for user icub by (uid=0)
Aug 05 09:57:45 icub-head systemd[1]: Created slice User Slice of UID 1000.
Aug 05 09:57:45 icub-head systemd[1]: Starting User Runtime Directory /run/user/1000...
Aug 05 09:57:45 icub-head systemd-logind[688]: New session 1 of user icub.
Aug 05 09:57:45 icub-head systemd[1]: Finished User Runtime Directory /run/user/1000.
Aug 05 09:57:45 icub-head systemd[1]: Starting User Manager for UID 1000...
Aug 05 09:57:45 icub-head systemd[990]: pam_unix(systemd-user:session): session opened for user icub by (uid=0)
Aug 05 09:57:45 icub-head systemd[990]: Reached target Paths.
Aug 05 09:57:45 icub-head systemd[990]: Reached target Timers.
Aug 05 09:57:45 icub-head systemd[990]: Starting D-Bus User Message Bus Socket.
Aug 05 09:57:45 icub-head systemd[990]: Listening on GnuPG network certificate management daemon.
Aug 05 09:57:45 icub-head systemd[990]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Aug 05 09:57:45 icub-head systemd[990]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Aug 05 09:57:45 icub-head systemd[990]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Aug 05 09:57:45 icub-head systemd[990]: Listening on GnuPG cryptographic agent and passphrase cache.
Aug 05 09:57:45 icub-head systemd[990]: Listening on debconf communication socket.
Aug 05 09:57:45 icub-head systemd[990]: Listening on Sound System.
Aug 05 09:57:45 icub-head systemd[990]: Listening on REST API socket for snapd user session agent.
Aug 05 09:57:45 icub-head systemd[990]: Listening on D-Bus User Message Bus Socket.
Aug 05 09:57:45 icub-head systemd[990]: Reached target Sockets.
Aug 05 09:57:45 icub-head systemd[990]: Reached target Basic System.
Aug 05 09:57:45 icub-head systemd[1]: Started User Manager for UID 1000.
Aug 05 09:57:45 icub-head systemd[990]: Starting Sound Service...
Aug 05 09:57:45 icub-head systemd[1]: Started Session 1 of user icub.
Aug 05 09:57:45 icub-head dbus-daemon[672]: [system] Activating via systemd: service name='org.freedesktop.RealtimeKit1' unit='rtkit-daemon.service' requested by ':1.20' (uid=1000 pid=1000 comm="/usr/bin/pulseaudio --daemonize=no --log-target=jo" label="unconfined")
Aug 05 09:57:45 icub-head systemd[1]: Starting RealtimeKit Scheduling Policy Service...
Aug 05 09:57:45 icub-head dbus-daemon[672]: [system] Successfully activated service 'org.freedesktop.RealtimeKit1'
Aug 05 09:57:45 icub-head systemd[1]: Started RealtimeKit Scheduling Policy Service.
Aug 05 09:57:45 icub-head rtkit-daemon[1017]: Successfully called chroot.
Aug 05 09:57:45 icub-head rtkit-daemon[1017]: Successfully dropped privileges.
Aug 05 09:57:45 icub-head rtkit-daemon[1017]: Successfully limited resources.
Aug 05 09:57:45 icub-head rtkit-daemon[1017]: Running.
Aug 05 09:57:45 icub-head rtkit-daemon[1017]: Watchdog thread running.
Aug 05 09:57:45 icub-head rtkit-daemon[1017]: Canary thread running.
Aug 05 09:57:45 icub-head systemd[990]: Started D-Bus User Message Bus.
Aug 05 09:57:45 icub-head dbus-daemon[1020]: [session uid=1000 pid=1020] AppArmor D-Bus mediation is enabled
Aug 05 09:57:45 icub-head systemd[990]: Started Sound Service.
Aug 05 09:57:45 icub-head systemd[990]: Reached target Main User Target.
Aug 05 09:57:45 icub-head systemd[990]: Startup finished in 455ms.
Aug 05 09:57:45 icub-head bluetoothd[671]: Endpoint registered: sender=:1.32 path=/MediaEndpoint/A2DPSink/sbc
Aug 05 09:57:45 icub-head bluetoothd[671]: Endpoint registered: sender=:1.32 path=/MediaEndpoint/A2DPSource/sbc
Aug 05 09:57:45 icub-head kernel: Bluetooth: RFCOMM TTY layer initialized
Aug 05 09:57:45 icub-head kernel: Bluetooth: RFCOMM socket layer initialized
Aug 05 09:57:45 icub-head kernel: Bluetooth: RFCOMM ver 1.11
traversaro commented 3 years ago

Brief analysis. The actual failure seem the following:

Aug 05 10:03:09 icub-head NetworkManager[671]: <warn>  [1628157789.0252] device (wlp2s0): re-acquiring supplicant interface (#1).
Aug 05 10:03:09 icub-head systemd-networkd-wait-online[445]: managing: main-bond
Aug 05 10:03:09 icub-head systemd[1]: NetworkManager-dispatcher.service: Succeeded.
Aug 05 10:03:09 icub-head wpa_supplicant[690]: ctrl_iface exists and seems to be in use - cannot override it
Aug 05 10:03:09 icub-head wpa_supplicant[690]: Delete '/run/wpa_supplicant/wlp2s0' manually if it is not used anymore
Aug 05 10:03:09 icub-head wpa_supplicant[690]: Failed to initialize control interface '/run/wpa_supplicant'.
                                               You may have another wpa_supplicant process already running or the file was
                                               left by an unclean termination of wpa_supplicant in which case you will need
                                               to manually remove this file before starting wpa_supplicant again.
Aug 05 10:03:09 icub-head wpa_supplicant[690]: nl80211: deinit ifname=wlp2s0 disabled_11b_rates=0
Aug 05 10:03:09 icub-head NetworkManager[671]: <error> [1628157789.0616] sup-iface[0x5586ea427200,wlp2s0]: error adding interface: wpa_supplicant couldn't grab this interface.

The system tries to get wpa_supplicant to work every 10 seconds, but after 2 minutes it gives up with the message:

Aug 05 10:03:49 icub-head NetworkManager[671]: <info>  [1628157829.0815] device (wlp2s0): supplicant interface keeps failing, giving up

And then the system is able to start. I tried to manually remove the /run/wpa_supplicant/wlp2s0 file and reboot, but this does not seem to fix the problem.

traversaro commented 3 years ago

For the curious, a complete log of a final steps of a halt and the first phases of boot (including the discussed problem) can be found in: iCubGenova09_icub_head_reboot_and_boot_2021_08_05.txt .

davidelasagna commented 3 years ago

Fixed by adding the optional: true also in the bond section as it follows: This could be caused because the bond want the wifi interface to be active in order to generate the bond properly.

 bonds:
    main-bond:
      optional: true
      addresses: [10.0.0.2/24]
      gateway4: 10.0.0.1
      nameservers:
        search: [icub.local]
        addresses: 
          - 10.0.0.1
      interfaces:
        - enp4s0
        - wlp2s0

First I checked if there were any misalignment in the file it seemed to that the optional: true was slightly misaligned in the wifi configuration, looking to the file previously pasted in this issue nothing to me seems misaligned therefore the issue was only on the bond configuration

traversaro commented 3 years ago

Thanks @davidelasagna ! I guess this is a change that will need to be integrated in the next image of iCubOS, but for the time being I think we can close this specific issue.