Open abitrolly opened 5 years ago
Finally got time to troubleshoot this. Switched to rescue
bootstrap script.
✗ podman run -it --rm --volume=$HOME/.scwrc:/.scwrc:Z scaleway/cli --region=ams1 inspect -f "{{.Bootscript.Title}}" server:xe3
arm64 rescue
Connected to xe3
instance. Volumes are different from the article.
root@xe3:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 46.6G 0 disk
├─vda15 253:15 0 100M 0 part
└─vda1 253:1 0 46.5G 0 part
Mounted volumes.
mkdir -p /mnt/volume1
mount /dev/vda1 /mnt/volume1
mkdir -p /mnt/volume15
mount /dev/vda15 /mnt/volume15
/dev/vda15
is just a boot volume.
ls -lR volume15
volume15:
total 1
drwxr-xr-x 3 root root 512 Mar 5 10:14 EFI
volume15/EFI:
total 1
drwxr-xr-x 2 root root 512 Mar 5 10:16 BOOT
volume15/EFI/BOOT:
total 120
-rwxr-xr-x 1 root root 122880 Mar 5 10:16 BOOTAA64.EFI
Inspecting last logs.
/var/log# ls -lat
total 1832
-rw-rw-r-- 1 root utmp 19200 May 12 14:54 wtmp
-rw-rw-r-- 1 root utmp 296296 May 12 14:54 lastlog
-rw-rw---- 1 root utmp 1091600 May 12 14:54 btmp
-rw-r--r-- 1 root root 533976 May 12 14:18 dpkg.log
-rw-r--r-- 1 root root 16671 May 12 14:16 alternatives.log
drwxr-xr-x 2 root root 4096 May 12 14:10 apt
drwxr-x--- 2 root adm 4096 May 12 06:52 unattended-upgrades
drwxr-xr-x 2 syslog syslog 4096 May 12 01:42 landscape
-rw-r--r-- 1 root adm 93777 May 12 01:21 cloud-init.log
-rw-r--r-- 1 root root 4593 May 12 01:21 cloud-init-output.log
-rw------- 1 root root 64064 May 12 01:20 tallylog
-rw-r--r-- 1 root root 32032 May 12 01:20 faillog
drwxr-xr-x 8 root root 4096 May 12 01:20 .
drwxr-sr-x+ 3 root systemd-journal 4096 May 12 01:20 journal
drwxr-xr-x 13 root root 4096 Mar 5 09:50 ..
drwxr-xr-x 2 root root 4096 Jan 16 23:53 dist-upgrade
drwxr-xr-x 2 root root 4096 Nov 23 2018 lxd
No messages
, no dmesg
...
Parsing wtmp
and btmp
just in case..
# last -f btmp
root ssh:notty 218.92.0.207 Sun May 12 14:54 gone - no logout
root ssh:notty 218.92.0.207 Sun May 12 14:54 - 14:54 (00:00)
root ssh:notty 218.92.0.207 Sun May 12 14:54 - 14:54 (00:00)
root ssh:notty 218.92.0.207 Sun May 12 14:53 - 14:54 (00:00)
...
# last -f wtmp
root pts/0 x.x.x.127 Sun May 12 14:54 - down (00:00)
ubuntu pts/0 x.x.x.127 Sun May 12 14:54 - 14:54 (00:00)
root pts/0 x.x.x.127 Sun May 12 14:49 - 14:49 (00:00)
...
/var/log/journal
contains systemd
logs, but rescue
image can not read them.
# journalctl -D journal
Journal file journal/71e7aa5b46f048658dbfde2a92c24320/system.journal uses an unsupported feature, ignoring file.
-- No entries --
# cat /etc/os-release | grep VERSION=
VERSION="16.04.2 LTS (Xenial Xerus)"
Sent logs through https://transfer.sh
tar -czf - /var/log/journal | curl --upload-file - https://transfer.sh/journal.tar.gz
Server rebooted and never woke up. Last lines from journalctl
.
$ journalctl -D journal
...
May 12 17:54:34 xe3 systemd[1]: Reached target Final Step.
May 12 17:54:34 xe3 systemd[1]: Starting Reboot...
May 12 17:54:34 xe3 systemd[1]: Stopped Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
May 12 17:54:34 xe3 systemd[1]: Stopping LVM2 metadata daemon...
May 12 17:54:34 xe3 systemd[1]: Stopped LVM2 metadata daemon.
May 12 17:54:34 xe3 systemd[1]: Shutting down.
May 12 17:54:34 xe3 systemd-shutdown[1]: Syncing filesystems and block devices.
May 12 17:54:34 xe3 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
May 12 17:54:34 xe3 systemd-journald[7690]: Journal stopped
Scaleway ARM64-2GB with Ubuntu 18.04 hang after updating and rebooting (unattended, with Ansible, or just pain apt-get). This is repeatable. It could have been avoided with continuous testing scenario for this case. This ticket is to confirm that Scaleway does this testing and implement that otherwise.