rancher / os

Tiny Linux distro that runs the entire OS as Docker containers
https://rancher.com/docs/os/v1.x/en/
Apache License 2.0
6.44k stars 655 forks source link

bnx2x firmware loading error - no network after boot #2877

Open pm73 opened 5 years ago

pm73 commented 5 years ago

RancherOS Version: (ros os version) v1.5.3

Where are you running RancherOS? (docker-machine, AWS, GCE, baremetal, etc.) baremetal: HP ProLiant BL460c G6 Server Blade

No network connection due to failure of loading NIC firmware (NIC model: Embedded NC532i Dual Port Flex-10 10GbE). bnx2x: [bxn2x_init_firmware:13564(eth1)]Can't load firmware file bnx2x/bnx2x-e1h-7.13.1.0.fw bnx2x 0000:02:00.0: Direct firmware load for bnx2x/bnx2x-e1h-7.13.1.0.fw failed with error -2

Could it be that there is some issue with firmware source and initrd was build without it? Issue #1881 might be connected.

Screenshots from boot process: https://imgur.com/a/EpscPh1

niusmallnan commented 5 years ago

The bnx2x/bnx2x-e2-7.13.1.0.fw firmware should be in kernel-extras in ROS v1.5.3.

The kernel-base in v1.5.1 has that firmware, but not in v1.5.3. Because ROS defaults to streamlining the kernel and firmware, only the latest version of each firmware driver is reserved for ROS. When we built v1.5.3, the bnx2x driver was updated in the firmware upstream.

[root@ip-172-31-47-29 bnx2x]# ls -ahl /proc/1/root/usr-v1.5.1/lib/firmware/bnx2x/
total 668
drwxr-xr-x    2 root     root        4.0K Feb 11 17:08 .
drwxr-xr-x   11 root     root        4.0K Feb 11 17:08 ..
-rw-r--r--    1 root     root      166.1K Feb 11 17:08 bnx2x-e1-7.13.1.0.fw
-rw-r--r--    1 root     root      174.8K Feb 11 17:08 bnx2x-e1h-7.13.1.0.fw
-rw-r--r--    1 root     root      313.4K Feb 11 17:08 bnx2x-e2-7.13.1.0.fw

[root@ip-172-31-47-29 bnx2x]# ls -ahl /proc/1/root/usr-v1.5.3/lib/firmware/bnx2x/
total 668
drwxr-xr-x    2 root     root        4.0K Jul 11 06:34 .
drwxr-xr-x   11 root     root        4.0K Jul 11 06:34 ..
-rw-r--r--    1 root     root      165.9K Jul 11 06:34 bnx2x-e1-7.13.11.0.fw
-rw-r--r--    1 root     root      174.1K Jul 11 06:34 bnx2x-e1h-7.13.11.0.fw
-rw-r--r--    1 root     root      314.7K Jul 11 06:34 bnx2x-e2-7.13.11.0.fw
niusmallnan commented 5 years ago

You can copy that firmware files from here. Just put them into /lib/firmware/bnx2x/, try to enable the NICs.

pm73 commented 5 years ago

This server doesn't have disks yet, I use iPXE. Can this be done with iPXE?

Did someone delete my previous comment?

niusmallnan commented 5 years ago

This is hard to do with iPXE only. You probably should have that firmware built-in, but you need to customize the initrd file.

You can refer to these code lines. It can help you modify the initrd file. https://github.com/rancher/os/blob/3fac5f7604e1e9d49ab44e246a067c5b168a82e2/scripts/tools/flush_crt_iso.sh#L47-L57

jhughes2112 commented 4 years ago

Uh oh, I have the same issue. Is there a way to resolve this without resorting to rebuilding the initrd? Several of my servers are fine (due to slightly different hardware configurations), but one is failing this way. Also iPXE booting.

jhughes2112 commented 4 years ago

Eh, not too bad. It took about 45 minutes to clone the firmware repo, unpack, copy one file, repack the initrd. Although the above instructions are technically accurate they aren't very helpful. Anyone interested in actually doing this will get more mileage out of this link instead. https://www.thegeekstuff.com/2009/07/how-to-view-modify-and-recreate-initrd-img/

McSlow commented 4 years ago

We had the same issue with Dell R720 Servers. I still do have understanding problems why this happens because the network cards have been updated to the last possible revision. But somehow firmware ... .1.0 is requested instead of .11.0. In my eyes there's a misalignment between kernel driver and provided firmware file in rancheros.

And that issue still applies to rancheros-1.5.6.

Anyway, to make this reproducible with forthcoming releases, I put some work in it - here's a patchscript, heavily based on @niusmallnan's last answer:

#!/bin/bash
# adds missing firmware files into RancherOS ISO.
# needs the following programs ( install using apt etc. if missing)
# - xorriso
# - cpio
# - wget
# - isolinux
# rest should be available in every linux installation
# computer where script runs needs internet. otherwise put firmware files in $DRIVERS_DIR (see below)

#usage:
# add-bnx2x.sh --iso <path/to/rancheros.iso>
# output iso will be in /tmp/new/new_rancheros.iso

set -ex

BASE_DIR=/tmp
ORIGIN_DIR=/tmp/origin
NEW_DIR=/tmp/new
WORK_DIR=/tmp/work
DRIVERS_DIR=/tmp/drivers

rm -rf ${ORIGIN_DIR} ${NEW_DIR} ${WORK_DIR} ${DRIVERS_DIR}
mkdir -p ${ORIGIN_DIR} ${NEW_DIR} ${WORK_DIR} ${DRIVERS_DIR}

while [ "$#" -gt 0 ]; do
    case $1 in
        --iso)
            shift 1
            ISO_FILE=$(readlink -f $1)
            ;;
        *)
            break
            ;;
    esac
    shift 1
done

#get the missing drivers
wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/bnx2x/bnx2x-e1h-7.13.1.0.fw -P ${DRIVERS_DIR}
wget  https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/bnx2x/bnx2x-e2-7.13.1.0.fw -P ${DRIVERS_DIR}
wget https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/bnx2x/bnx2x-e1-7.13.1.0.fw -P ${DRIVERS_DIR}

# copy the iso content
mount -t iso9660 -o loop ${ISO_FILE} ${ORIGIN_DIR}
cp -rf ${ORIGIN_DIR}/* ${NEW_DIR}

# copy the initrd file
INITRD_NAME=$(basename ${ORIGIN_DIR}/boot/initrd-*)
cp -r ${ORIGIN_DIR}/boot/initrd-* ${WORK_DIR}/

# update and rebuild the initrd
pushd ${WORK_DIR}
mv initrd-* ${INITRD_NAME}.gz
gzip -d ${INITRD_NAME}.gz
cpio -i -F ${INITRD_NAME}
rm -f ${INITRD_NAME}

#put additional drivers (Broadcom NetXtreme2x) into it's place
cp ${DRIVERS_DIR}/* ${WORK_DIR}/usr/lib/firmware/bnx2x/

find | cpio -H newc -o | gzip -9 > ${NEW_DIR}/boot/${INITRD_NAME}
popd

pushd ${NEW_DIR}
xorriso \
    -as mkisofs \
    -l -J -R -V "${DISTRIB_ID}" \
    -no-emul-boot -boot-load-size 4 -boot-info-table \
    -b boot/isolinux/isolinux.bin -c boot/isolinux/boot.cat \
    -isohybrid-mbr /usr/lib/ISOLINUX/isohdpfx.bin \
    -o new_$(basename ${ISO_FILE}) .
popd

# copy out
umount ${ORIGIN_DIR}
McSlow commented 4 years ago

unfortunately, the above script doesn't help too much, while installing it loads the faulty image from the net and leaves the server without network after rebooting :(