sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
711 stars 1.36k forks source link

/dev/sdaX: Can't open blockdev #9998

Closed alexrallen closed 2 years ago

alexrallen commented 2 years ago

Description

On the master branch recently we are seeing the following error during boot

Feb  3 22:05:59.622294 r-lionfish-13 ERR kernel: [    4.201777] /dev/sda3: Can't open blockdev

This issue is reproducing on all Mellanox platforms. It also seems to be reproducing on some broadcom platforms as well (see logs from https://github.com/Azure/sonic-buildimage/issues/9028 and https://github.com/Azure/sonic-buildimage/issues/9898)

This issue is a recent degradation (unsure of the exact hash) as we did not detect this issue during prior analysis of master branch syslog errors.

Steps to reproduce the issue:

  1. Boot the switch

Version

SONiC Software Version: SONiC.master.281-1223a7bab_Internal
Distribution: Debian 11.2
Kernel: 5.10.0-8-2-amd64
Build commit: 1223a7bab
Build date: Tue Feb 15 08:17:17 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic02-005

Platform: x86_64-mlnx_msn3420-r0
HwSKU: ACS-MSN3420
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2019X13878
Model Number: MSN3420-CB2FO
Hardware Revision: A1
Uptime: 03:45:31 up  7:40,  2 users,  load average: 0.93, 0.52, 0.45

Docker images:
REPOSITORY                                         TAG                             IMAGE ID       SIZE
docker-teamd                                       latest                          4899e4577326   441MB
docker-teamd                                       master.281-1223a7bab_Internal   4899e4577326   441MB
docker-dhcp-relay                                  latest                          19df64bea63d   448MB
docker-sonic-telemetry                             latest                          9d92ec8b0ff4   527MB
docker-sonic-telemetry                             master.281-1223a7bab_Internal   9d92ec8b0ff4   527MB
docker-syncd-mlnx                                  latest                          5c5f6744acf9   919MB
docker-syncd-mlnx                                  master.281-1223a7bab_Internal   5c5f6744acf9   919MB
docker-sonic-mgmt-framework                        latest                          a4f4d24539a3   581MB
docker-sonic-mgmt-framework                        master.281-1223a7bab_Internal   a4f4d24539a3   581MB
docker-snmp                                        latest                          781db5f588bf   469MB
docker-snmp                                        master.281-1223a7bab_Internal   781db5f588bf   469MB
docker-sflow                                       latest                          854d79699010   441MB
docker-sflow                                       master.281-1223a7bab_Internal   854d79699010   441MB
docker-router-advertiser                           latest                          2cf67e186fea   426MB
docker-router-advertiser                           master.281-1223a7bab_Internal   2cf67e186fea   426MB
docker-platform-monitor                            latest                          1c55c00e4fb1   659MB
docker-platform-monitor                            master.281-1223a7bab_Internal   1c55c00e4fb1   659MB
docker-orchagent                                   latest                          779e83a949df   461MB
docker-orchagent                                   master.281-1223a7bab_Internal   779e83a949df   461MB
docker-nat                                         latest                          2f923ddf3b54   443MB
docker-nat                                         master.281-1223a7bab_Internal   2f923ddf3b54   443MB
docker-mux                                         latest                          674c9ed2e2f4   478MB
docker-mux                                         master.281-1223a7bab_Internal   674c9ed2e2f4   478MB
docker-macsec                                      latest                          839197208212   443MB
docker-macsec                                      master.281-1223a7bab_Internal   839197208212   443MB
docker-lldp                                        latest                          29f0ad0807fd   466MB
docker-lldp                                        master.281-1223a7bab_Internal   29f0ad0807fd   466MB
docker-fpm-frr                                     latest                          5e33cb17c569   459MB
docker-fpm-frr                                     master.281-1223a7bab_Internal   5e33cb17c569   459MB
docker-database                                    latest                          1957efe93436   426MB
docker-database                                    master.281-1223a7bab_Internal   1957efe93436   426MB
zhangyanzhao commented 2 years ago

No function impact, but want to understand why the error message happened and how to clean it up.

saiarcot895 commented 2 years ago

I can confirm this is happening even in a KVM testbed, so it's not platform-specific.

lguohan commented 2 years ago

kvm is using vda, so sda is not available on kvm, so it still could be platform specific issue.

dgsudharsan commented 2 years ago

kvm is using vda, so sda is not available on kvm, so it still could be platform specific issue.

Guohan as alex pointed out it is seen even in tech supports from Broadcom issues mentioned in the description and so I believe its not platform specific

saiarcot895 commented 2 years ago

The Can't open blockdev message is coming from this line: https://github.com/Azure/sonic-buildimage/blob/9fe128c8e8bb44fa9959c482116541ce64b87486/files/initramfs-tools/union-mount.j2#L135

During initramfs, /dev/sda3 (or in the case of virtual switch, /dev/vda3) is getting mounted at /root/host. The mount binary that runs isn't the mount binary that's available during normal runtime (when the system has fully booted), but the version coming from busybox, since we're in initramfs. As part of the mount sequence, if a filesystem type hasn't been given in the command with the -t flag, the busybox mount code will go through /etc/filesystems and /proc/filesystems to see which filesystems are supported by the kernel, and then tries to mount /dev/vda3 using each filesystem listed there. The first one that succeeds is then used. (See this and this for the code that they use.)

In our initramfs environment, /etc/filesystems doesn't exist, so only /proc/filesystems is read. The first filesystem that is tested is squashfs. For this filesystem, the mount code in the kernel goes through a code path that results in the Can't open blockdev getting printed. This can be reproduced by running mount -t squashfs /dev/vda3 /root/host in initramfs (the destination directory doesn't matter, and the mount may fail, but the kernel message will still get printed). The next filesystems that get tested are vfat, ext3, and ext2, all of which fail with a "Device or resource busy" error from the mount command (there's no error message in the kernel for this). The last filesystem that gets printed is ext4, which does succeed.

This kernel message and the filesystem type testing can be avoided by passing in the filesystem type of the mount command, by changing the command to mount -t ext4 ${ROOT} ${rootmnt}/host.