Hardware pass-through to the VM

Milon-yang commented 3 years ago

cannot see any dev under /dev

usb0: usb@2f00000 { compatible = "fsl,ls1046a-dwc3"; reg = <0x0 0x2f00000 0x0 0x10000>; interrupts = <0 60 4>; dr_mode = "host"; snps,quirk-frame-length-adjustment = <0x20>; snps,dis_rxdet_inp3_quirk; snps,incr-burst-type-adjustment = <1>, <4>, <8>, <16>; usb3-lpm-capable; snps,dis-u1u2-when-u3-quirk; snps,host-vbus-glitches; dma-coherent; };

/ # [ 3613.325454] 000: usb 2-1: new SuperSpeed Gen 1 USB device number 2 using xhci-hcd [ 3613.353267] 000: usb-storage 2-1:1.0: USB Mass Storage device detected [ 3613.361912] 000: scsi host0: usb-storage 2-1:1.0 [ 3614.367963] 000: scsi 0:0:0:0: Direct-Access Generic MassStorageClass 1536 PQ: 0 ANSI: 6 [ 3614.761393] 000: sd 0:0:0:0: [sda] 62333952 512-byte logical blocks: (31.9 GB/29.7 GiB) [ 3614.762116] 000: sd 0:0:0:0: [sda] Write Protect is off [ 3614.762829] 000: sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 3614.780563] 000: sda: sda1 sda2 [ 3614.794318] 000: sd 0:0:0:0: [sda] Attached SCSI removable disk

/dev # ls -la total 5 drwxr-xr-x 2 0 0 1024 Jul 13 2021 . drwxr-xr-x 13 0 0 1024 Jan 1 00:01 .. -rwxr-xr-x 1 0 0 46 Jan 1 01:04 tty2 -rwxr-xr-x 1 0 0 46 Jan 1 01:04 tty3 -rwxr-xr-x 1 0 0 46 Jan 1 01:04 tty4 /dev #

icedieler commented 3 years ago

Hi, do you have CONFIG_DEVTMPFS and CONFIG_DEVTMPFS_MOUNT enabled in your guest kernel?

Milon-yang commented 3 years ago

Yes, we have enabled these two items.

Thanks.

发件人: Matthias Lange @.> 发送时间: 2021年7月14日 3:15 收件人: kernkonzept/manifest @.> 抄送: Yang Lianping @.>; Author @.> 主题: Re: [kernkonzept/manifest] Hardware pass-through to the VM (#5)

Hi, do you have CONFIG_DEVTMPFS and CONFIG_DEVTMPFS_MOUNT enabled in your guest kernel?

― You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/kernkonzept/manifest/issues/5#issuecomment-879336021, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AUNE52JBXKQSO642H7RDBGLTXSGGXANCNFSM5AIWCZ2Q.

alacko commented 3 years ago

In Linux, you either you mount devtmpfs, or run udev/mdev to populate /dev, whatever your prefer.

Milon-yang commented 3 years ago

Hi alacko, I try to add NET to VM on NXP LS1046A platform. I use "virt-ls1046a_fman_dts" file, and the boot log "virt_ls1046a_fman_log" appear the error as below:

Then we add clock struct infomation to dts file, you could refer to the file"virt_ls1046a_fman_add_clock_dts",

But still not work, even can not boot up. The failed log is "virt_ls1046a_fman_add_clock_log". Could you give any help or suggestion. Thanks.

virt_ls1046a_fman_add_clock_dts.zip virt_ls1046a_fman_add_clock_log.zip virt_ls1046a_fman_log.zip virt-ls1046a_fman_dts.zip

alacko commented 3 years ago

Ok, since there's no output, could you please enable earlycon with "console=ttyAMA0 earlycon" on the Linux command line? Please follow recommendations given in comment in https://github.com/kernkonzept/uvmm/blob/master/server/src/device/pl011.cc for the device tree. This should give output from Linux and show better what's going on.

Milon-yang commented 3 years ago

Hi alacko,

I have enabled earlycon with "console=ttyAMA0 earlycon" on the Linux command line. the output as attachment.
I have read the recommendations, the comment only describe how to add uart dts. Are you suggest me configure net refer to uart's configuration?

Thanks a lot!

ls1046a_net_boot_log.zip

alacko commented 3 years ago

Hi, the log still tells that console=hvc0 is set in the kernel command line.

I was actually hoping you would switch the output from virtio (console=hvc0) to the emulated serial (console=ttyAMA0) because earlycon only works with some serial and is available much earlier than virtio. Then, with the clocks entry added (as above) I was hoping to see some output (contrary to what is seen (actually not seen) in virt_ls1046a_fman_add_clock_log.zip). I believe there's not much else to do than adding the node in the DTS as described, and have earlycon (CONFIG_SERIAL_EARLYCON) and the driver (CONFIG_SERIAL_AMBA_PL011) enabled in the Linux kernel.

Milon-yang commented 3 years ago

Hi alacko, Sorry to misunderstood your message before. I have changed the console=ttyAMA0, and got the output as attachment. Please help to have a look, thank you!

log_with_earlycon.zip log_with_fman_clock_add.zip

alacko commented 3 years ago

OK, thanks. It's a BUG, something to work with!

I guess it's this statement is triggering the BUG: https://github.com/Freescale/linux-fslc/blob/5.4.y%2Bqoriq/drivers/staging/fsl_qbman/qman_high.c#L1403 Looking at it I would guess that the code is assuming some specific CPU mask that is not given in the virtual environment. You need to debug the code and find out to what affine_mask is set and what 'cpu' is and cpu does not happen to be in affine_mask?

Milon-yang commented 3 years ago

Hi alacko,

I noticed the BUG, thanks. I tried to print out the CPU id in kernel，when the board boot with linux, the CPU id is 0. But when boot in L4RE, the CPU id is 16. I wonder where the CPU id was modified in L4RE.

alacko commented 3 years ago

Hi, that's indeed strange. L4Re won't modify the cpu id because that's solely inside Linux, but it will have something to do with the difference in the setup. In the VM there's just one CPU, and on bare-metal there are 4. First thing to to check: Limit bare-metal to one core and check whether the same happens or not. Then, also track down where the number is coming from, you see the call-chain in the BUG output. In a setup with just one CPU "cpu" is only supposed to be 0. Maybe also ensure to keep CONFIG_NR_CPUS low, like at one or 4, the driver code somehow plays with this.

Milon-yang commented 3 years ago

Hi alacko,

I have limited bare-metal to 2 cores (the range of cpu numbers could only be set from 2 to 4096), and the BUG still happens.
The output cpu number in my debug log is coming from bare-metal as above. the number 16 is default cpu numbers, when I change it to 2, the output cpu number is 2 in debug log.

alacko commented 3 years ago

Hi, do I understand you right that with CONFIG_NR_CPUS=2 the bug triggers also when running bare-metal (without L4Re)?

Milon-yang commented 3 years ago

Hi alacko,

With CONFIG_NR_CPUS=2 the bug won't trigger when running bare-metal (without L4Re), it just happens in running with L4RE.

I tried to print out the cpu id and cpu bitmask in kernel driver, when running without L4RE, the cpu id is 0 and 1, and bitmask is
3 (11 in binary). But when running with L4RE, the cpu id is 2, and the cpu bitmask is 0. I suspect that cpu bitmask is not configured in L4RE or somewhere.

Thanks.

alacko commented 3 years ago

Hi, L4Re does not change anything inside Linux, however, the virtual platform is different from bare-metal and that will make the difference. The VM just has one core, so the cpu id should only ever be 0 (my guess), same as smp_processor_id() returns. However, there must be a reason why the id equals to the number of possible cpus. This needs debugging / tracing to where the id of 2 comes from, to find out what the reason for the difference is.

Milon-yang commented 3 years ago

Hi alacko,

I debug the kernel, find out that the cpu id and bitmask set incorrectly due to VM access invalid memory. We add "qportals: qman-portals@500000000" to dts file (see attachment) according to NXP LS1046 data sheet . When the VM booting，it will report error "VMM: FATAL: cannot handle VM memory access @ 504093040 ip=ffff800011074200 lr=ffff80001107ff28". But when using the same dts to run linux (without L4Re) , the kernel boot normally.

I wonder if the VM uses the same hardware physical address as Linux in dts file? Or is the memory address 0x500000000 for qportals set correctly in L4RE dts.

Thanks a lot!

virt-ls1046a_net_dts.zip

alacko commented 3 years ago

Hi,

do you also have a corresponding entry in IO's vbus config file for the 500000000 range? Like this: qman = Hw.Device(function() compatible = "fsl,qman"; Property.hid = "fsl,qman"; Resource.reg0 = Res.mmio(0x1880000, 0x1880000 + 0x10000 - 1); Resource.reg1 = Res.mmio(0x500000000, 0x504ffffff); Resource.irq0 = Res.irq(32 + 45, Io.Resource.Irq_type_level_high); Resource.irq1 = Res.irq(32 + 172, Io.Resource.Irq_type_level_high); Resource.irq2 = Res.irq(32 + 174, Io.Resource.Irq_type_level_high); Resource.irq3 = Res.irq(32 + 176, Io.Resource.Irq_type_level_high); Resource.irq4 = Res.irq(32 + 178, Io.Resource.Irq_type_level_high); Resource.irq5 = Res.irq(32 + 180, Io.Resource.Irq_type_level_high); Resource.irq6 = Res.irq(32 + 182, Io.Resource.Irq_type_level_high); Resource.irq7 = Res.irq(32 + 184, Io.Resource.Irq_type_level_high); Resource.irq8 = Res.irq(32 + 186, Io.Resource.Irq_type_level_high); Resource.irq9 = Res.irq(32 + 188, Io.Resource.Irq_type_level_high); Resource.irqA = Res.irq(32 + 190, Io.Resource.Irq_type_level_high); end) (plus of course adding it to the vbus too) such that running the VM does not exhibit the error anymore as you describe.

Milon-yang commented 3 years ago

Hi, thanks for your suggestion.

I add fman, bman, qman information to io.cfg and vm_hw.vbus. But now the L4RE still report error "VMM: FATAL: cannot handle VM memory access @ 0 ip=ffff80001008fbc0 lr=ffff800011e099d0". You could see my configuration file and bootlog in attachment.

Thanks again!

attachment.zip

alacko commented 3 years ago

Hi, so looks like the driver is doing a null-pointer dereference. The code location where it is done is at 0xffff80001008fbc0, could you check the vmlinux binary what it does? E.g. with gdb or objdump -ldS (and please use nokaslr on the kernel command line to have stable addresses).

Milon-yang commented 3 years ago

Hi alacko, Could you please give me the dts file you used for LS1046A in L4RE? I want to compare the difference in network setting. Thanks.

alacko commented 3 years ago

Hi, I used the one from Linux, i.e., arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dts, from Linux 5.13 (or later, whatever was current back then). I did not do any modifications (except some for console maybe).

alacko commented 3 years ago

Hi, sorry, I think I did not remember right, I also used the device tree that u-boot provides to Linux when booted bare-metal. That's important because u-boot is adding things to the device tree, including things for network.

Milon-yang commented 3 years ago

OK, I will check it. Thanks for your suggestion.

Milon-yang commented 3 years ago

I checked the dts and debugged kernel, found out an issue. I configure qman-portal in dts as below:

qman-portal@0 { compatible = "fsl,qman-portal", "fsl,qman-portal-3.2.1"; reg = <0x0 0x4000>, <0x4000000 0x4000>; interrupts = <0 172 4>; cell-index = <0>; };

......

When the kernel boot without L4RE, the qman driver can parse qman-portal and boot ok. But when boot with L4RE, the qman driver reports the status of qman-portal in dts was "disabled" (I added debug information in qman driver).

It is so strange. We have not set the status = "disabled" for qman-portal in anywhere.

alacko commented 3 years ago

L4Re will disable the node if there's no corresponding MMIO region available on the vbus of the VM. There should be a ranges statement in the parent node (qman-portals@500000000), is it there such that the address is finally ok?

Milon-yang commented 3 years ago

The ranges statement in the parent node (qman-portals@500000000) was always set as below, that's not the reason.

The key problem is fman, qman, fman use interrupts 44 and 45. When irqs map to VM, the No. are 76 and 77. but In L4RE, the irq No. were occupied in somewhere：

vm1 | VMM[ioproxy]: IO device 'fman@1a00000': irq 0x4c -> 0x4c already registered vm1 | VMM[ioproxy]: IO device 'fman@1a00000': irq 0x4d -> 0x4d already registered

So the net device pass through failed.

Please help to check which devices occupied irq 76 and 77 in L4RE. Thanks!

alacko commented 3 years ago

Well that's not an error but rather an info that an interrupt is used in multiple DT nodes, and it is in fman0, bman and qman I believe. Is it now that you do not get interrupts and the memory access is ok now, or is it something else?

Milon-yang commented 3 years ago

If I don't add irqs 76 and 77 for fman, qman, bman in io.cfg, the memory access is ok. But the kernel will still panic later. You could see the boot log without irq setting in attachment.

bootlog_without_irq.zip bootlog_with_irq.zip

alacko commented 3 years ago

The version without irq looks like an out of memory condition. Is maybe your ramdisk so large that it does not fit into memory of the VM?

Milon-yang commented 3 years ago

I reduced the size of ramdisk and remove fman irq information in io.cfg, the kernel panic issue disappeared. But it still can't mount filesystem and crashed. log as the attachment.

Actually, I believe the irq information is necessary in io.cfg. But if the irq 76 and 77 added, the L4RE still report error "VMM: FATAL: cannot handle VM memory access @ 0 ip=ffff80001008fbc0 lr=ffff800011e09a54". I think the key proplem is why we can't add irq 76 and 77 in io.cfg, or which devices occupied irq 76 and 77 in L4RE? Thanks!

bootlog_without_irq.zip

alacko commented 3 years ago

It's not crashing, it's just not showing the console. Your kernel command line has console=ttyAMA0 however, the boot console is ns16550a0 based, i.e. please the kernel command line to console=ttyS0.

Both IRQs are listed in multiple devices. Are they not listed in any device or just not listed in one? Actually it is not relevant for the VM in which device an interrupt is listed as long as it is generally available.

Milon-yang commented 3 years ago

OK, thanks alacko. Now, the IRQ 77 was used by fman qman and bman. IRQ 76 was used by fman and ptp_timer0. All net devices in dts were necessary and listed in io.cfg (include IRQ information).

The problem is the L4RE will report error "VMM: FATAL: cannot handle VM memory access @ 0 ip=ffff80001008fbc0 lr=ffff800011e09a54". The reseon is Irq 77 and 76 were repeated registered. But if irq 77 and 76 were not used, the net devices will init failed. So I have no idea how to resolve this conflict.

io.zip virt-ls1046a_net.zip

alacko commented 3 years ago

Hmm, this looks like a null pointer dereference at code location ffff80001008fbc0 in Linux. Could you rerun with nokaslr on the Linux command line (such that Linux addresses are constant) and check what's happening at the location shown at 'ip'?

Milon-yang commented 3 years ago

Hi alacko, I added "nokaslr" as you said on the Linux command line, now the code crashed at ffff80001008faf8. You could refer to log_with_nokaslr.zip.

By the way, I get Linux system.map, maybe the code is crashed at ffff80001008faa8 T __memset_io, you could see the detail in attachment system_map.zip, thanks.

log_with_nokaslr.zip System_map.zip

alacko commented 3 years ago

Hi, yes, that's possible that a memset is doing this. The question is what's the code path leading to this, someone in the Linux kernel must be calling memset with target address 0 which is not good doing in general. Could please apply the attached patch to l4/pkg/uvmm which shall help us to better understand what's going on. Thanks. uvmm-p1.patch.gz

Milon-yang commented 3 years ago

Hi, please receive the boot log with the patch applied, thanks!

bootlog_with_patch.zip

alacko commented 3 years ago

Hi, sorry, that went wrong (unfortunately the patch is not compatible). Could you put this patch into your Linux and see what it prints and check whether and why the function it prints calls memset_io with dst=0?

 diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
index aa7a4ec6a3ae..efd4f81f4a9e 100644
--- a/arch/arm64/kernel/io.c
+++ b/arch/arm64/kernel/io.c
@@ -72,6 +72,8 @@ void __memset_io(volatile void __iomem *dst, int c, size_t count)
 {
        u64 qc = (u8)c;

+       printk("__memset_io caller: %pS\n", (void *)_RET_IP_);
+
        qc |= qc << 8;
        qc |= qc << 16;
        qc |= qc << 32;

Milon-yang commented 3 years ago

Hi, The boot log with the kernel patch as attachment, system crashed in function "qman_init_early".

bootlog_with_kernel_patch.zip

alacko commented 3 years ago

The file, where qman_init_early is in (drivers/staging/fsl_qbman/qman_config.c) has one user of memset_io. This is related to parsing things out of the device tree, and seems to require that the device tree contains two reserved regions "fsl,qman-fqd" and "fsl,qman-pfdr". If those are not set then the corresponding memory areas for those reserved regions remain as initialized (to 0) which then leads to the memset_io at 0. Could you check whether the functions qman_fqd() and qman_pfdr() are called and which values would be set (if it is set with base of 0 and some size)? Also check whether you have corresponding entries in the device tree, in mine:

      reserved-memory {
                #address-cells = <0x02>;
                #size-cells = <0x02>;
                ranges;

                bman-fbpr {
                        compatible = "fsl,bman-fbpr";
                        size = <0x00 0x1000000>;
                        alignment = <0x00 0x1000000>;
                        no-map;
                        alloc-ranges = <0x00 0x00 0x10000 0x00>;
                        linux,phandle = <0x0c>;
                        phandle = <0x0c>;
                };

                qman-fqd {
                        compatible = "fsl,qman-fqd";
                        size = <0x00 0x800000>;
                        alignment = <0x00 0x800000>;
                        no-map;
                        alloc-ranges = <0x00 0x00 0x10000 0x00>;
                        linux,phandle = <0x0a>;
                        phandle = <0x0a>;
                };

                qman-pfdr {
                        compatible = "fsl,qman-pfdr";
                        size = <0x00 0x2000000>;
                        alignment = <0x00 0x2000000>;
                        no-map;
                        alloc-ranges = <0x00 0x00 0x10000 0x00>;
                        linux,phandle = <0x0b>;
                        phandle = <0x0b>;
                };
        };

I believe the driver code could handle this much better and maybe it works on bare-metal to just memset a region at physical 0 without failing, but it should really not do this, and rather check whether those variables used have actually been set.

Milon-yang commented 3 years ago

Hi alacko, I replace the reserved-memory device tree with your information, the error "VMM: FATAL: cannot handle VM memory access @ 0 ip=ffff80001008fbc0 lr=ffff800011e09a54" solved, but the fman still probe failed. It seems that the FM param was not correctly set. You could see the attachment. By the way, could you please send me your dts file of network used for L4RE? Thanks.

bootlog_with_fman_probe_failed.zip

alacko commented 3 years ago

Hi, reading a little bit through the code locations where the error messages are printed I would suspect the driver is missing firmware data. The firmware data seems to come out of the device tree. Is there a "fsl,fman-firmware" node in your device tree?

Concerning my device tree: I used vanilla linux when I tried it, and not the linux-fslc tree as you do. The device tree I took is the one from Linux, i.e. arch/arm64/boot/dts/freescale/fsl-ls1046a-frwy.dtb. (And there is no firmware in there, and the drivers in vanilla Linux do not request it.)

Milon-yang commented 3 years ago

Hi, there is no "fsl,fman-firmware" node in my device tree. Thanks.

alacko commented 3 years ago

Hi, ok, could you then follow the code to check what is causing the error message in the code? I was just speculating on the firmware by reading myself without being able to actually see what is running. Please place printk lines in the driver and find out why the driver ends up in those error messages. Thanks.

Milon-yang commented 3 years ago

Hi, I have solved the probe error. the reason is that in L4RE the "fsl,fman-firmware" should be added to dts file. the firmware data is not configured in linux but in uboot，the detail you could refer to "flexbuild/packages/firmware/u-boot/drivers/net/fm/fdt.c".

Now, fman node could be seen in fs，network physical connection is ok. but the mac-address of eth port still need to set in kernel or somewhere for further testing.

kernkonzept / manifest

Hardware pass-through to the VM #5