sonic-net / SONiC

Landing page for Software for Open Networking in the Cloud (SONiC) - https://sonic-net.github.io/SONiC/
2.19k stars 1.1k forks source link

Able to boot on an Arista 7050T, but it throws a `fallocate: fallocate failed: Operation not supported` error #1639

Open oceanplexian opened 4 months ago

oceanplexian commented 4 months ago

šŸ‘‹ I bought a 7050T to use as a lab switch (48 Port 10GBase-T variant). It runs EOS fine but SONiC would be much better šŸ˜„

I replaced the internal /mnt/flash with a USB flash adapter, and tossed in 128GB drive formatted ext3 (aboot doesn't seem to support ext4, only ext3 or fat). For the heck of it I also upgraded the switch with 8GB DDR3

Aboot# boot sonic-latest.swi
504.61: Cleaning flash content /mnt/flash
505.84: Generating boot-config, machine.conf and cmdline
505.92: Installing image under /mnt/flash/image-master.371261-52f6dd65a
505.92: Moving swi to a tmpfs
507.74: Extracting swi content
509.66: Extracting platform.tar.gz
509.71: Extracting dockerfs.tar.gz from swi
629.24: Remove installer
642.86: Next reboot will use flash:image-master.371261-52f6dd65a/.sonic-boot.swi
644.74: Kexecing...
[  644.791451] Starting new kernel
[    0.403458] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010000 is 530076)
ļæ½fallocate: fallocate failed: Operation not supported
mount: mounting /dev/sda on /root/host failed: Device or resource busy
[   15.289878] systemd[255]: /usr/lib/systemd/system-generators/systemd-sonic-generator failed with exit status 1.
[   18.591530] rc.local[425]: + grep build_version
[   18.654806] rc.local[424]: + cat /etc/sonic/sonic_version.yml
[   18.731625] rc.local[433]: + sed -e s/build_version: //g;s/'//g
[   18.815125] rc.local[418]: + SONIC_VERSION=master.371261-52f6dd65a
[   18.891136] rc.local[418]: + FIRST_BOOT_FILE=/host/image-master.371261-52f6dd65a/platform/firsttime
[   19.012912] rc.local[418]: + SONIC_CONFIG_DIR=/host/image-master.371261-52f6dd65a/sonic-config
[   19.123125] rc.local[418]: + SONIC_ENV_FILE=/host/image-master.371261-52f6dd65a/sonic-config/sonic-environment
[   19.248893] rc.local[418]: + [ -d /host/image-master.371261-52f6dd65a/sonic-config -a -f /host/image-master.371261-52f6dd65a/sonic-config/soni]
[   19.417620] rc.local[418]: + logger SONiC version master.371261-52f6dd65a starting up...
[FAILED] Failed to start /etc/rc.local Compatibility.
[DEPEND] Dependency failed for Reboot cause determination service.
[DEPEND] Dependency failed for Config chassis_db.
[   19.530252] rc.local[418]: + grub_installation_needed=
[DEPEND] Dependency failed for database-chassis container.
[DEPEND] Dependency failed for Platā€¦opology configuration service.
[DEPEND] Dependency failed for Confā€¦ization and migration service.
[DEPEND] Dependency failed for Updaā€¦figuration based on minigraph.
[DEPEND] Dependency failed for Procā€¦tilization data export daemon.
[   19.835749] rc.local[418]: + [ ! -e /host/machine.conf ]
[DEPEND] Dependency failed for Contā€¦lane ACL configuration daemon.
[   20.443903] kdump-tools[410]: Starting kdump-tools:
[FAILED] Failed to start OpenBSD Secure Shell server.
[   20.682768] rc.local[463]: + blkid
[   20.827898] kdump-tools[437]: no crashkernel= parameter in the kernel cmdline ...
[   20.919514] kdump-tools[455]:  failed!
[   20.980273] rc.local[464]: + grep ONIE-BOOT
[   21.045854] rc.local[471]: + awk {print $1}
[   21.104558] rc.local[472]: + sed -e s/:.*$//
[FAILED] Failed to start OpenBSD Secure Shell server.
[   21.165569] rc.local[470]: + head -n 1
[   21.305647] rc.local[418]: + onie_dev=
[   21.359154] rc.local[418]: + mkdir -p /mnt/onie-boot
[   21.422901] rc.local[418]: + mount /mnt/onie-boot
[   21.486176] rc.local[480]: mount: /mnt/onie-boot: can't find in /etc/fstab.
[   21.576918] rc.local[418]: + onie_grub_cfg=/mnt/onie-boot/onie/grub/grub-machine.cfg
[FAILED] Failed to start OpenBSD Secure Shell server.
[   21.691336] rc.local[418]: + [ ! -e /mnt/onie-boot/onie/grub/grub-machine.cfg ]
[   21.879165] rc.local[418]: + log_migration /mnt/onie-boot/onie/grub/grub-machine.cfg not found
[   21.991227] rc.local[418]: /etc/rc.local: 27: cannot create /host/migration/migration.log: Directory nonexistent
[FAILED] Failed to start OpenBSD Secure Shell server.
[   22.121340] rc.local[418]: + migrate_nos_configuration
[   22.295169] rc.local[418]: + rm -rf /host/migration
[   22.355271] rc.local[418]: + mkdir -p /host/migration
[   22.429189] rc.local[483]: + cat /proc/cmdline
[   22.487860] rc.local[418]: + set -- tsc=reliable console=ttyS0 reboot=p block_flash=pci0000:00/0000:00:12.2/usb1/1-3/.*$ block_usb1=pci0000:000
[FAILED] Failed to start OpenBSD Secure Shell server.
[   23.291452] r[   23.394719] overlayfs: filesystem on '/var/lib/docker/check-overlayfs-support2691589699/upper' not supported as upperdir
c.local[418]: + [ -n  ]
[FAILED] Failed to start OpenBSD Secure Shell server.
[FAILED] Failed to start Docker Application Container Engine.
[DEPEND] Dependency failed for Database container.
[   23.565446] rc.local[418]: + umount /mnt/onie-boot
[   23.913747] rc.local[484]: umount: /mnt/onie-boot: not mounted.
[   23.994518] rc.local[418]: + . /host/machine.conf
[   23.994683] rc.local[418]: /etc/rc.local: 239: .: cannot open /host/machine.conf: No such file

I was able to mount /dev/sda manually to /host/ and /root/host (Not sure if it needs one or both here). So it at least seems like this can be solved, maybe it's some kind of timing issue?

I experimented with it a bit further, hooked up the management port, SSH'd in, modified the docker systemd unit and tried mounting the docker root from the host filesystem

ā— docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/docker.service.d
             ā””ā”€docker.service.conf
     Active: active (running) since Sun 2024-03-24 03:31:48 UTC; 1min 46s ago
TriggeredBy: ā— docker.socket
       Docs: https://docs.docker.com
   Main PID: 570 (dockerd)
      Tasks: 9
     Memory: 144.3M
     CGroup: /system.slice/docker.service
             ā””ā”€570 /usr/bin/dockerd -H unix:// --data-root /host/image-master.371261-52f6dd65a/docker --storage-driver=overlay2 --bip=240.127.1.1/24 --iptables=false --ipv6=true --fixed-cidr-v6=fd00::/80

Even got many of the containers up but a number of them crash

root@sonic:~# docker ps -a
CONTAINER ID   IMAGE                                COMMAND                  CREATED      STATUS                       PORTS     NAMES
3707886c24c3   docker-sonic-telemetry:latest        "/usr/local/bin/supeā€¦"   2 days ago   Up 8 minutes                           telemetry
737f4b29f3d9   docker-snmp:latest                   "/usr/local/bin/supeā€¦"   2 days ago   Exited (0) 3 minutes ago               snmp
d7103081fd19   docker-platform-monitor:latest       "/usr/bin/docker_iniā€¦"   2 days ago   Up 8 minutes                           pmon
0f5fb16d2409   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supeā€¦"   2 days ago   Up 9 minutes                           mgmt-framework
908ff7874f5c   docker-lldp:latest                   "/usr/bin/docker-lldā€¦"   2 days ago   Up 9 minutes                           lldp
7f8e5bf1c426   docker-fpm-frr:latest                "/usr/bin/docker_iniā€¦"   2 days ago   Exited (137) 2 days ago                bgp
720a6f596791   docker-router-advertiser:latest      "/usr/bin/docker-iniā€¦"   2 days ago   Exited (0) 4 minutes ago               radv
127f9974d66e   docker-syncd-bfn:latest              "/usr/local/bin/supeā€¦"   2 days ago   Exited (0) 3 minutes ago               syncd
49fb7c6d7d52   docker-teamd:latest                  "/usr/local/bin/supeā€¦"   2 days ago   Exited (0) 2 days ago                  teamd
09793f33f3a7   docker-orchagent:latest              "/usr/bin/docker-iniā€¦"   2 days ago   Exited (137) 4 minutes ago             swss
ae343babb9da   docker-eventd:latest                 "/usr/local/bin/supeā€¦"   2 days ago   Up 13 minutes                          eventd
d47099d50a9a   docker-database:latest               "/usr/local/bin/dockā€¦"   2 days ago   Exited (0) 2 days ago                  databaseplatform-arista-fabric
92874a9ceff2   docker-database:latest               "/usr/local/bin/dockā€¦"   2 days ago   Up 13 minutes                          database
root@sonic:~#

The error I'm most concerned about is this one, but. I could be wrong:

root@sonic:~# show system-health detail
Traceback (most recent call last):
  File "/usr/local/bin/show", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 120, in detail
    manager, chassis, stat = get_system_health_status()
  File "/usr/local/lib/python3.9/dist-packages/show/system_health.py", line 30, in get_system_health_status
    chassis = Chassis()
  File "/usr/lib/python3/dist-packages/arista/utils/sonic_platform/chassis.py", line 78, in __init__
    self._eeprom = Eeprom(readPrefdl())
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 73, in readPrefdl
    return readI2cPrefdlEeprom()
  File "/usr/lib/python3/dist-packages/arista/core/platform.py", line 54, in readI2cPrefdlEeprom
    raise UnknownPlatformError('Could not identify current platform')
arista.core.exception.UnknownPlatformError: Could not identify current platform```

Seems like it can't read the eeprom, I assume we need that to do things like show and manipulate interfaces, e.g.

root@sonic:~# show interfaces status
  Interface    Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin    Type    Asym PFC
-----------  -------  -------  -----  -----  -------  ------  ------  -------  ------  ----------
root@sonic:~#

Would be great if anyone had any ideas, or at the least someone can replicate my steps and get a bit further on this switch or a similar one. I'm thinking a module needs to be loaded or a driver is missing? If such a driver even exists. Thanks šŸ™‡

quxyzzy commented 2 months ago

Having similar issue on 7050QX-32 with the UnknownPlatformError('Could not identify current platform')