sophgo / bootloader-riscv

15 stars 37 forks source link

Bootloader removes custom console bootargs #71

Open skeuchel opened 7 months ago

skeuchel commented 7 months ago

I have the milkv pioneer board and installed nixos on it. The contents of my /boot/extlinux/extlinux.conf includes bootargs for outputting on ttyS0

APPEND init=/nix/store/3xsvpmg37mcky27my30nvxsjh98c0xwc-nixos-system-mvp-24.05.20240203.c8d11da/init earlycon console=ttyS0,115200 debug nvme_core.io_timeout=600 nvme_core.admin_timeout=600 cma=512M swiotlb=65536

however, when checking /proc/cmdline after booting

init=/nix/store/3xsvpmg37mcky27my30nvxsjh98c0xwc-nixos-system-mvp-24.05.20240203.c8d11da/init earlycon debug nvme_core.io_timeout=600 nvme_core.admin_timeout=600 cma=512M swiotlb=65536 console=tty1

my custom console=ttyS0,115200 bootarg was removed and a console=tty1 was appended instead. As a result, the output on the serial console stops once the bootconsole is disabled.

Since I intend to use my machine mainly in headless mode, I really want to see the console output on the serial port. I saw that the revyos people are forcing output on the serial console (https://github.com/revyos/sg2042-vendor-kernel/commit/ac43de713b04df788e7052308e83ec26bd1ebc48) by changing the kernel config, which I am also using as a workaround

silvanshade commented 6 months ago

I'm also having problems with this and have the same use case.

Since I intend to use my machine mainly in headless mode, I really want to see the console output on the serial port. I saw that the revyos people are forcing output on the serial console (revyos/sg2042-vendor-kernel@ac43de7) by changing the kernel config, which I am also using as a workaround

@skeuchel Can you say a little more about how you managed to get this modification to work?

Which kernel are you using and did you change any other options?

I tried making the change you describe (to the fedora equivalent) using the kernel at https://github.com/sophgo/linux-riscv (the sg2042-dev branch), but am having problems getting it to work correctly.

Steps I followed:

  1. Patch sophgo_mango_fedora_defconfig as described for console params
  2. make ARCH=riscv sophgo_mango_fedora_defconfig
  3. scripts/config --disable DEBUG_INFO
  4. make
  5. make modules
  6. sudo make modules_install
  7. sudo make install (last step fails with grubby error; ignore)
  8. Manually update /boot/extlinux/extlinux.conf to add new kernel
  9. Reboot

The console parameter is seemingly added, but then I see this in the kernel output:

7040000000.serial: ttyS0 at MMIO 0x7040000000 (irq = 46, base_baud = 20833333) is a 16550A

And then the console becomes garbled and the boot seems to hang.

It seems to me that the problem is at least partly due to the wrong (base_baud = 20833333) or whatever.

For example, the output from booting the SD (with the stock Fedora image) shows this:

7040000000.serial: ttyS0 at MMIO 0x7040000000 (irq = 46, base_baud = 31250000) is a 16550A

And in that case the serial console works as expected (with base_baud = 31250000), continuing all the way to login prompt.

Another thing I noticed is that the menu is seemingly never displayed (when booting from the nvme) even if I set prompt 1 in extlinux.conf or add a long timeout.

skeuchel commented 6 months ago

I'm also having problems with this and have the same use case.

Since I intend to use my machine mainly in headless mode, I really want to see the console output on the serial port. I saw that the revyos people are forcing output on the serial console (revyos/sg2042-vendor-kernel@ac43de7) by changing the kernel config, which I am also using as a workaround

@skeuchel Can you say a little more about how you managed to get this modification to work?

Which kernel are you using and did you change any other options?

I'm using the sg2042-master branch which I patched and rebased on 6.8.0-rc5. You can see my modifications here https://github.com/skeuchel/linux/commits/sg2042-master/ The master branch is more stable for me than the dev branch, but YMMV. Also I switched to a sata ssd because nvme seemed unstable.

I tried making the change you describe (to the fedora equivalent) using the kernel at https://github.com/sophgo/linux-riscv (the sg2042-dev branch), but am having problems getting it to work correctly.

Steps I followed:

  1. Patch sophgo_mango_fedora_defconfig as described for console params
  2. make ARCH=riscv sophgo_mango_fedora_defconfig
  3. scripts/config --disable DEBUG_INFO
  4. make
  5. make modules
  6. sudo make modules_install
  7. sudo make install (last step fails with grubby error; ignore)
  8. Manually update /boot/extlinux/extlinux.conf to add new kernel
  9. Reboot

The console parameter is seemingly added, but then I see this in the kernel output:

7040000000.serial: ttyS0 at MMIO 0x7040000000 (irq = 46, base_baud = 20833333) is a 16550A

And then the console becomes garbled and the boot seems to hang.

It seems to me that the problem is at least partly due to the wrong (base_baud = 20833333) or whatever.

For example, the output from booting the SD (with the stock Fedora image) shows this:

7040000000.serial: ttyS0 at MMIO 0x7040000000 (irq = 46, base_baud = 31250000) is a 16550A

And in that case the serial console works as expected (with base_baud = 31250000), continuing all the way to login prompt.

Yes I ran into this as well. The problem is that an old dtb is passed to your new kernel. I assume yours comes either from an old bootloader of a provided sdcard image or and old bootloader from flash. If you update your bootloader as well it works.

Another thing I noticed is that the menu is seemingly never displayed (when booting from the nvme) even if I set prompt 1 in extlinux.conf or add a long timeout.

Indeed, the vendor provided u-root only displays the boot menu on the graphical console, and not on the serial console. At least not anymore, because I remember it working with old images. I think I solved it by building u-root with a kernel that has this patch https://github.com/skeuchel/linux/commit/38ed86ba108e6708ded617260ce11ab9c807272a. This forced it to only display on the serial console. I am not sure anymore, because I am booting via edk2 now which displays a boot menu on serial by default, so you have to experiment a bit with it. You can easily change the kernel that u-root is compiled against, by cloning this repository and changing the kernel repository that is used in the .github/workflows/build.yml file.

silvanshade commented 6 months ago

Thanks so much for the info. I'll look into the details you provided and see if I can get things working.

Indeed, the vendor provided u-root only displays the boot menu on the graphical console, and not on the serial console. At least not anymore, because I remember it working with old images.

Yeah.

One thing that is strange is that if I boot from the SD card using the (latest) Fedora image, I get the boot console (at least if the NVMe is also present and also has a kernel installed, not sure otherwise). But if I restore the same image to NVMe and boot from that (which is what I've been doing), the menu is not displayed.

EDIT0: I just noticed this text from ini configuration docs

The conf.ini can be stored in either of the following two locations. Read the INI file from miscoSD Card first. If the file is not found, read it from SPI Flash0.

I think I also remember seeing some error about [sophgo-config] missing from conf.ini when booting from the NVMe. So that would probably explain the different behavior when booting from the NVMe vs SD card, i.e., the conf.ini is probably missing or incorrect on my system.

I haven't tried to update the SPI just yet but it's strange that it would be missing (although I only obtained the board and not the full system), and maybe surprising that the NVMe even boots at all if that's indeed the case.

EDIT1: Actually there's no /riscv64/conf.ini on the SD card image either apparently.

silvanshade commented 6 months ago

And in that case the serial console works as expected (with base_baud = 31250000), continuing all the way to login prompt.

Yes I ran into this as well. The problem is that an old dtb is passed to your new kernel. I assume yours comes either from an old bootloader of a provided sdcard image or and old bootloader from flash. If you update your bootloader as well it works.

@skeuchel Just one more question about this: I'm assuming you are using the prebuilt bootloader artifacts that you linked earlier from the MilkV forums. Is that correct? And the custom kernel is only used for the last phase of the boot flow?

Or are you also compiling parts of the bsp (described here) using a custom kernel configuration also?

Could you share the conf.ini you are using?

EDIT0: I was able to flash the .bin file from the CI artifacts from the bootloader repo (along with a basic conf.ini, though not sure it's correct).

Now when I boot I see sg2042:v0.3 (apparently I had v0.2 before).

I'm seeing another error though:

SOPHGO ZSBL
sg2042:v0.3

sg2042 work in single socket mode
chip0 ddr info: raw data=0x5050505, 
    ddr0 size:0x800000000
    ddr1 size:0x800000000
    ddr2 size:0x800000000
    ddr3 size:0x800000000
read config from flash
rv boot from spi flash
load fw_dynamic.bin image from sf 0x65053f to memory 0x0 size 270032
load riscv64_Image image from sf 0x69240f to memory 0x2000000 size 26356736
load initrd.img image from sf 0x1fb500f to memory 0x30000000 size 15511140
load mango-milkv-pioneer.dtb image from sf 0x601000 to memory 0x20000000 size 45099
flash read file ok
chip0 ddr node in dtb:
    base:0x0000000000, len:0xc0000000
    base:0x0100000000, len:0x700000000
    base:0x0800000000, len:0x800000000
    base:0x1000000000, len:0x800000000
    base:0x1800000000, len:0x800000000
mac0:0xaaaaaaaaaaaa
fdt path offset failed
Error at '*s': <85>^D
main core sbi jump to 0x0, dynamic info:40019860

Referring to this part:

fdt path offset failed
Error at '*s': <85>^D

Not sure what that's about. I specified the device tree in the conf.ini like so:

[devicetree]
name = mango-milkv-pioneer.dtb

And from the earlier files it seemingly finds it.

Anyway, after flashing the SPI, the stock image fedora kernel (6.1.55) wouldn't boot properly, with several kernel oops occurring, but maybe not surprising.

The 6.1.72+ I compiled from sg2042-dev still had the garbled serial output. I was using the .dtb from the kernel tree loaded via ftd in extlinux.conf. Haven't had a chance to try sg2042-master or your fork yet.

By the way, I assume the .img file from the bootloader artifacts is expected to be imaged to the first partition of the disk? Don't see any documentation about that but it would seem to follow the structure from the SD from what I can tell.

skeuchel commented 6 months ago

@skeuchel Just one more question about this: I'm assuming you are using the prebuilt bootloader artifacts that you linked earlier from the MilkV forums. Is that correct? And the custom kernel is only used for the last phase of the boot flow?

Or are you also compiling parts of the bsp (described here) using a custom kernel configuration also?

So the prebuild bootloader artifacts of the vendor repository are enough if you want to use graphical (or blind) boot, with a custom compiled kernel also using the sg2042-dev branch. However, to have linuxboot output on the serial console, I had to remove any mention of tty1, including the kernel that is build as part of linuxboot. For that you would have to clone the bootloader-riscv repository, and link it to your own linux fork which strips tty1 away. Also if you want to use the sg2042-master kernel branch as a distro kernel, then you should build your own bootloader against the sg2042-master branch, because the distro kernel will get the dtb from the bootloader.

Could you share the conf.ini you are using?

I am currently booting from an sdcard. That is the contents of my ini

[sophgo-config]

[devicetree]
name = mango-milkv-pioneer.dtb

[kernel]
name = SG2042.fd

[eof]

and this is the list of files on the sdcard

./riscv64
./riscv64/riscv64_Image
./riscv64/cv1800b-milkv-duo.dtb
./riscv64/cv1812h-huashan-pi.dtb
./riscv64/mango-milkv-pioneer.dtb
./riscv64/mango-sophgo-x4evb.dtb
./riscv64/mango-sophgo-x8evb.dtb
./riscv64/sg2042-milkv-pioneer.dtb
./riscv64/initrd.img
./riscv64/fw_dynamic.bin
./riscv64/SG2042.fd
./riscv64/conf.ini
./fip.bin
./zsbl.bin
./BOOT

EDIT0: I was able to flash the .bin file from the CI artifacts from the bootloader repo (along with a basic conf.ini, though not sure it's correct).

Now when I boot I see sg2042:v0.3 (apparently I had v0.2 before).

I'm seeing another error though:

SOPHGO ZSBL
sg2042:v0.3

sg2042 work in single socket mode
chip0 ddr info: raw data=0x5050505, 
    ddr0 size:0x800000000
    ddr1 size:0x800000000
    ddr2 size:0x800000000
    ddr3 size:0x800000000
read config from flash
rv boot from spi flash
load fw_dynamic.bin image from sf 0x65053f to memory 0x0 size 270032
load riscv64_Image image from sf 0x69240f to memory 0x2000000 size 26356736
load initrd.img image from sf 0x1fb500f to memory 0x30000000 size 15511140
load mango-milkv-pioneer.dtb image from sf 0x601000 to memory 0x20000000 size 45099
flash read file ok
chip0 ddr node in dtb:
    base:0x0000000000, len:0xc0000000
    base:0x0100000000, len:0x700000000
    base:0x0800000000, len:0x800000000
    base:0x1000000000, len:0x800000000
    base:0x1800000000, len:0x800000000
mac0:0xaaaaaaaaaaaa
fdt path offset failed
Error at '*s': <85>^D
main core sbi jump to 0x0, dynamic info:40019860

Referring to this part:

fdt path offset failed
Error at '*s': <85>^D

Not sure what that's about. I specified the device tree in the conf.ini like so:

[devicetree]
name = mango-milkv-pioneer.dtb

And from the earlier files it seemingly finds it.

I think that error messages is related to trying to configure the mac addresses. Your output also contains mac0:0xaaaaaaaaaaaa so I assume you specified that in the conf.ini as well. I believe that mac address configuration is not intended for the pioneer board. You can simply remove that from the conf.ini file to get rid of the error message.

More specifically, the zsbl will patch the dtb file a bit, changing memory and ethernet nodes, before it is passed to the next stage (opensbi). You can see the patching of the /soc/ethernet@7040026000/ node here but that node is not part of the pioneer's dtb, because it is deleted here. Hence the error is about the patching of the dtb because of the missing node. As I said, this is not for the pioneer board, which has onboard network based on the realtek r8125(?) chip which is different from whatever 7040026000 is.

Anyway, after flashing the SPI, the stock image fedora kernel (6.1.55) wouldn't boot properly, with several kernel oops occurring, but maybe not surprising.

Yeah the dtb of the new kernel and drivers changed somewhat, so not a good idea to boot an old kernel with a new dtb.

The 6.1.72+ I compiled from sg2042-dev still had the garbled serial output. I was using the .dtb from the kernel tree loaded via ftd in extlinux.conf. Haven't had a chance to try sg2042-master or your fork yet.

Still with the new dtb? Are you sure that the new dtb was loaded? Whatever is specified in extlinux.conf is not used, the dtb is passed down from zsbl.

By the way, I assume the .img file from the bootloader artifacts is expected to be imaged to the first partition of the disk? Don't see any documentation about that but it would seem to follow the structure from the SD from what I can tell.

It's an image of an entire disk, but one that only contains a single partition.

$ fdisk -l firmware_single.img 
Disk firmware_single.img: 256 MiB, 268435456 bytes, 524288 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x2967e74b

Device               Boot Start    End Sectors  Size Id Type
firmware_single.img1       2048 524287  522240  255M  c W95 FAT32 (LBA)
silvanshade commented 6 months ago

Thanks again for all the info. I'm pretty new to this and it's difficult to find the right documentation so I really appreciate it.

Could you share the conf.ini you are using?

I am currently booting from an sdcard. That is the contents of my ini

[sophgo-config]

[devicetree]
name = mango-milkv-pioneer.dtb

[kernel]
name = SG2042.fd

[eof]

and this is the list of files on the sdcard

./riscv64
./riscv64/riscv64_Image
./riscv64/cv1800b-milkv-duo.dtb
./riscv64/cv1812h-huashan-pi.dtb
./riscv64/mango-milkv-pioneer.dtb
./riscv64/mango-sophgo-x4evb.dtb
./riscv64/mango-sophgo-x8evb.dtb
./riscv64/sg2042-milkv-pioneer.dtb
./riscv64/initrd.img
./riscv64/fw_dynamic.bin
./riscv64/SG2042.fd
./riscv64/conf.ini
./fip.bin
./zsbl.bin
./BOOT

I see.

So you are using EDK II then with GRUB?

If so, can you say a bit about how the GRUB config looks and how you have the drive partitions configured?

I tried switching from:

[kernel]
name = riscv64_Image

to SG2042.fd like you have. That boots me into a UEFI shell. If I run exit, then it takes me to a menu where I can configure boot devices, etc.

I'm not sure how to set things up to automatically boot from that though.

The default boot device is "UEFI Misc Device" (IIRC). I see an option for the NVMe, but that's no good since I nuked the Fedora install (and only have the contents of the firmware_single.img).

If I burn the Fedora image to a USB drive, I also see that in the list of boot options, but it doesn't seem to want to actually boot from it (even if I set it as the first in boot order, and hit continue). If I try, it just continues for a bit then puts me back in the UEFI shell (which if I exit then brings me back to the graphical menu).

I am able to select riscv64_Image from a partition (using the "Boot From File" option) but it doesn't get me very far (if it starts at all, depending on the compiled version) because it eventually can't mount the root partition (since it's not specified properly).

There's an .efi file on the Fedora stock image, which I can try to "Boot From File" also, but it briefly shows "Welcome to Grub" and crashes with a relocation error.

I know I should probably be rebuilding this stuff from scratch with the new bootloader but currently I'm just trying to figure out how to get all of the pieces fit together.

Referring to this part:

fdt path offset failed
Error at '*s': <85>^D

I think that error messages is related to trying to configure the mac addresses. Your output also contains mac0:0xaaaaaaaaaaaa so I assume you specified that in the conf.ini as well. I believe that mac address configuration is not intended for the pioneer board. You can simply remove that from the conf.ini file to get rid of the error message.

This appears to fix the error, thanks!

The 6.1.72+ I compiled from sg2042-dev still had the garbled serial output. I was using the .dtb from the kernel tree loaded via ftd in extlinux.conf. Haven't had a chance to try sg2042-master or your fork yet.

Still with the new dtb? Are you sure that the new dtb was loaded? Whatever is specified in extlinux.conf is not used, the dtb is passed down from zsbl.

In that case, it would still seem to be using the newer correct .dtb since the contents of /riscv64 came from the single_firmware.img.

I'm thinking it might have to do with this commit.

I notice this change is present in the sg2042-master branch but NOT the sg2042-dev branch. So if you got it to work without the garbled text using sg2042-master (which I haven't tried yet), that might make sense.