mikaku / Fiwix

A UNIX-like kernel for the i386 architecture
https://www.fiwix.org
Other
401 stars 32 forks source link

Incorrect passing of e820 memory map to Linux kexec guests #72

Closed Googulator closed 3 months ago

Googulator commented 6 months ago

The following 2 excerpts are from the same live-bootstrap qemu session.

Fiwix reported the memory map as follows:

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x00000000bffdffff available
          0x00000000bffe0000-0x00000000bfffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x0000000100000000-0x000000013fffffff available

Then, after kexec, Linux reports:

[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009cfff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009d000-0x000000000009efff] reserved
[    0.000000] BIOS-e820: [mem 0x000000000009f000-0x000000000009fbfe] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009fffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000ffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdfffe] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bffffffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000fefffffe] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000fffffffe] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013ffffffe] usable

Two oddities are visible:

  1. All of the memory regions reported by Fiwix are missing one byte at the end, when re-reported by Linux.
  2. An additional reservation is visible in the range 9d000-9efff. This block, for some reason, doesn't have the off-by-1 ending seen in other blocks.
mikaku commented 6 months ago

I've run Fiwix and Linux on QEMU with the following same configuration:

qemu-system-i386 \
        -drive file=<floppy.img>,format=raw,if=floppy,index=0 \
        -boot a \
        -m 4G \
        -enable-kvm \
        -machine pc

Here are the results:

Screenshot from 2024-02-13 08-22-49

Screenshot from 2024-02-13 08-23-21

As you can see there is no difference in the memory output. The boot loader used in the floppy was GRUB v1 (legacy) in both cases.

What was your QEMU configuration? What was your Linux kernel version?

Googulator commented 6 months ago

The problem is only seen if you first boot Fiwix, and then kexec from Fiwix to Linux. Kexec'd Linux will then see an incorrect memory map.

The issue was seen on both QEMU 6.2, and on bare metal.

mikaku commented 6 months ago

Can you, please, paste in here the Linux kernel binary you are using?

Googulator commented 6 months ago

kernels.zip

This contains the Fiwix and Linux kernels, as well as the kexec loaders for them, as compiled in live-bootstrap. These were captured from one of my bare metal test machines.

mikaku commented 6 months ago

I don't have enough console back history to read the e820 lines as they are on top of the output of the Linux kernel. Adding console /dev/ttyS0 in the kexec_cmdline= argument don't work either. Are the serial devices enabled in this Linux kernel?

Googulator commented 6 months ago

They are, but the syntax is slightly different in Linux than in Fiwix: console=ttyS0

mikaku commented 6 months ago

They are, but the syntax is slightly different in Linux than in Fiwix: console ttyS0

I'm completely unable to redirect the output to the serial line. I've used console=ttyS0 but it doesn't work. I'm unable to reproduce your problem.

Googulator commented 6 months ago

Easiest way to reproduce is probably using live-bootstrap. Apply this patch to rootfs.py to get a serial log:

diff --git a/rootfs.py b/rootfs.py
index c31d5a1..5d7df2c 100755
--- a/rootfs.py
+++ b/rootfs.py
@@ -282,6 +282,8 @@ print(shutil.which('chroot'))
             arg_list += [
                 '-machine', 'kernel-irqchip=split',
                 '-nic', 'user,ipv6=off,model=e1000',
+                '-chardev', 'socket,id=char0,port=45454,host=0.0.0.0,server=on,wait=on,telnet=on,logfile=serial.log',
+                '-serial', 'chardev:char0',
                 '-nographic'
             ]
             run(args.qemu_cmd, *arg_list)

then use "telnet localhost 45454" or equivalent to start receiving the log on screen (it won't start building until it sees a connection - unfortunately I haven't found a way to keep the normal stdio chardev, and still log to a file).

mikaku commented 6 months ago

I'm sorry, it doesn't work at all.

I only see the messages generated by kernel/kexec.c during the transition from Fiwix to Linux. No Linux kernel boot messages are shown in the serial line, and the scroll is too fast and to big to have enough time and history to go back.

Here is the cmdline I use in Fiwix:

ro root=/dev/hda2 console=/dev/ttyS0 kexec_proto=linux kexec_size=8000 kexec_cmdline=\"ro root=/dev/sda2 console=ttyS0\"

I don't know why the Linux boot messages aren't redirected to the serial line.

mikaku commented 5 months ago

As commented on #bootstrapable IRC channel, I've finally built a simple enough Linux kernel that shows very few boot messages. This made me able to scroll back up to the top of the console history and see the memory map when the Linux kernel is kexec'ed from Fiwix.

This is the screen shot of such memory map:

Screenshot from 2024-02-21 21-16-15

All of the memory regions reported by Fiwix are missing one byte at the end, when re-reported by Linux.

Yes, the problem with all these addresses ending with ...ffe instead of ...fff is a bug in the bios_map_init() function. But I think that the fix is a bit simpler than your PR.

diff --git a/mm/bios_map.c b/mm/bios_map.c
index b1829bd..4d71295 100644
--- a/mm/bios_map.c
+++ b/mm/bios_map.c
@@ -100,7 +100,7 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
 {
        struct multiboot_mmap_entry *bmmap;
        unsigned int from_high, from_low, to_high, to_low;
-       unsigned long long to;
+       unsigned long long to, to_orig;
        int n, type;

        bmmap = bmmap_addr;
@@ -112,6 +112,7 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
                while((unsigned int)bmmap < (unsigned int)bmmap_addr + bmmap_length) {
                        from_high = (unsigned int)(bmmap->addr >> 32);
                        from_low = (unsigned int)(bmmap->addr & 0xFFFFFFFF);
+                       to_orig = (bmmap->addr + bmmap->len);
                        to = (bmmap->addr + bmmap->len) - 1;
                        to_high = (unsigned int)(to >> 32);
                        to_low = (unsigned int)(to & 0xFFFFFFFF);
@@ -124,6 +125,9 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
                                to_low,
                                bios_mem_type[type]
                        );
+                       /* restore the original end address */
+                       to_high = (unsigned int)(to_orig >> 32);
+                       to_low = (unsigned int)(to_orig & 0xFFFFFFFF);
                        if(n < NR_BIOS_MM_ENT && bmmap->len) {
                                bios_mem_map[n].from = from_low;
                                bios_mem_map[n].from_hi = from_high;

After applying this patch, the memory map in the Fiwix kernel looks like this (same as before, nothing changed):

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x000000000ffdffff available
          0x000000000ffe0000-0x000000000fffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x000000000009d000-0x000000000009efff reserved

Then, after kexec, Linux reports:

Screenshot from 2024-02-22 18-59-58

An additional reservation is visible in the range 9d000-9efff. This block, for some reason, doesn't have the off-by-1 ending seen in other blocks.

This memory reservation is needed by the kexec implementation:

https://github.com/mikaku/Fiwix/blob/3a71a2fd68ad60198035e69de643560fa5a1fda8/mm/memory.c#L308-L313

But you are right that this bug prevented this address to have the off-by-1 ending.

Looks like this is also fixed. Can you, please, confirm that this patch fixes all these memory map problems?

Additionally, there is also a bug in kernel/kexec.c that affected the multiboot1 protocol:

diff --git a/kernel/kexec.c b/kernel/kexec.c
index c35a15d..aab5ada 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -247,7 +247,7 @@ void kexec_multiboot1(void)

        /* space reserved for the memory map structure */
        nmaps = 0;
-       while(bios_mem_map[nmaps].to) {
+       while(bios_mem_map[nmaps].type) {
                nmaps++;
        }
        esp -= sizeof(struct multiboot_mmap_entry) * nmaps;
@@ -259,7 +259,7 @@ void kexec_multiboot1(void)
                map->addr = map->addr << 32 | bios_mem_map[n].from;
                map->len = bios_mem_map[n].to_hi;
                map->len = map->len << 32 | bios_mem_map[n].to;
-               map->len -= map->addr - 1;
+               map->len -= map->addr;
                map->type = bios_mem_map[n].type;
                map++;
        }
mikaku commented 5 months ago

@Googulator, I've just pushed some changes in your PR #74 that merges your patch with mine. Can you, please, check the final patch and see if you get the same results?

mikaku commented 5 months ago

@Googulator, did you have an opportunity to check the final patch?

Googulator commented 5 months ago

Not yet, will do in the next few days.

mikaku commented 3 months ago

@Googulator, did you find time to check the final patch? I have some modifications to push but I need to close this before.

Googulator commented 3 months ago

With the current code in #74:

memory    0x0000000000000000-0x000000000009fbff available
          0x000000000009fc00-0x000000000009ffff reserved
          0x00000000000f0000-0x00000000000fffff reserved
          0x0000000000100000-0x00000000bffdffff available
          0x00000000bffe0000-0x00000000bfffffff reserved
          0x00000000feffc000-0x00000000feffffff reserved
          0x00000000fffc0000-0x00000000ffffffff reserved
          0x0000000100000000-0x000000013fffffff available
WARNING: detected a total of 3071MB of available memory below 4GB.
WARNING: only up to 2GB of physical memory will be used.
memory    0x000000000009d000-0x000000000009efff available -> reserved
...
kexec_linux: jumping to linux_trampoline() ...
[    0.000000] Linux version 4.14.341-openela_1 (@) (gcc version 4.0.4) #1 SMP PREEMPT @0
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bffdffff] usable
[    0.000000] BIOS-e820: [mem 0x00000000bffe0000-0x00000000bfffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000013fffffff] usable

Looks to be fixed.

mikaku commented 3 months ago

Excellent, thank you.