mikaku / Fiwix

A UNIX-like kernel for the i386 architecture
https://www.fiwix.org
Other
407 stars 32 forks source link

An initrd of 1700K or higher will crash the kernel on boot #34

Closed rick-masters closed 1 year ago

rick-masters commented 1 year ago

An initrd of 1700K or larger will push low kernel memory over the 4MB mark, but the memory over 4MB is not mapped by setup_minmem so the kernel crashes when trying to access it.

Details on how to reproduce this are described later.

This Issue can be divided into two separate issues:

  1. An initrd that is too large should produce an appropriate error and fail more gracefully
  2. The kernel should support larger initrd drives.

At least #1 should be fixed. This can be done by determining how big an initrd can be accommodated and checking that it can fit on startup. If it cannot fit then display an error message. Currently it just hangs or seg faults.

For #2, currently Fiwix supports an initrd large enough to boot the kernel and mount a root partition. So, opinions may vary on how important it is to support a larger initrd drive. However, the Fiwix documentation in docs/kexec.txt seems to anticipate a 48MB initrd so it seems like larger drives were intended to be supported. With a modest number of changes a larger initrd can be supported.

The following explains the cause of the problem.

Very early in the boot process, the kernel calls setup_minmem to map enough memory to boot the kernel. Later, the kernel places critical data structures such as page tables in low memory, after the kernel image and multiboot modules. And importantly, the stack (%esp) is set to a location after the multiboot modules: https://github.com/mikaku/Fiwix/blob/e4c1e7ef94f400d21227dbab4c0dca0da55baa33/kernel/boot.S#L140-L146

However, if an initrd module is large enough, it will push the stack pages beyond the 4MB of low mapped memory and the kernel will crash when the first value is pushed on the stack here: https://github.com/mikaku/Fiwix/blob/e4c1e7ef94f400d21227dbab4c0dca0da55baa33/kernel/boot.S#L146-L148

Even if the stack fits within the first 4MB, the page tables which are allocated after the stack may not fit in 4MB and so the kernel may crash later when accessing page table memory in map_kaddr.

A proper solution seems to require mapping more low memory in the boot process to fit the multiboot modules + page_tables + kernel data structures, etc. But how much memory must be mapped is not calculated exactly until much later in the boot process (by mem_init). But we cannot wait until this late stage to map the stack page because the stack is used earlier to pass local variables to start_kernel.

Perhaps the amount of low memory to be mapped could be estimated and mapped earlier in the boot (perhaps in setup_minmem). But consider that setup_minmem would not have access to kstat which holds the size of physical memory because that structure has not been populated yet. So calculating the size of the page tables would be difficult. There might be a way to make this work but it would be difficult and would require ongoing maintenance when the size of kernel data structures change.

For these reasons, I am proposing a different idea: simply leave the stack where it was initially set on boot, below the kernel rather than above where it would take signficant work to map it properly. Then, the boot process can stay the same for the most part. After the kernel starts and determines how much low memory needs to be mapped (in the mem_init function), the kernel can map additional memory pages as needed in the normal page tables for the initrd module and kernel data structures. Note that my PR only reserves one stack page because the kernel doesn't appear to need more than that but more pages could easily be reserved if necessary.

The forthcoming PR will allow mapping the initrd ram disk into a much larger memory area. However, at some point the initrd module may bump into another problem: it will start to conflict with memory mapped by user processes. User processes typically start their mapping at 0x08048000. If the initrd is large enough, its mapping will overlap process memory mappings. So, without changing the kernel initrd memory mapping design, an initrd must be limited to approximately 120MB. But this is still much better than 1.6MB.

The role of ramdisksize= kernel parameter

In my PR I have stopped using ramdisksize to control the size of the initrd ram drive. Using the ramdisksize parameter for this purpose is unnecessary and can lead to either an error (if too small) or wasted memory (if too large). Instead, the ram drive is sized to match the size of the provided initrd file system image. This allows a small initrd to be combined with a large ram drive (or the reverse). If ramdisksize controlled the initrd size then ramdisksize could not exceed 120MB as previously explained, and so all ram drives would be need to be limited to 120MB.

I can't think of a reason to give your initrd ram drive a size other than the size of the initrd file system anyway. In general, requiring all ram drives, including initrd, to be the same size is inflexible and while this change does not resolve that completely, it is a significant improvement, in my opinion.

Notes on Reproducing the Issue and Testing the PR

How to reproduce:

qemu-system-i386 \
    -kernel fiwix \
    -initrd initrd-2048K.img \
    -append "root=/dev/ram0 ramdisksize=2048 initrd=initrd-2048K.img"

To simplify testing, I have created initrd.img files of various sizes that can be booted standalone with qemu. In other words, they do not require a hard drive.

I have started with an initrd.img originating from fiwix.org/downloads.html. There is an initrd.img inside the FiwixOS-3.2-initrd-i386.img file system. However, this initrd.img has an fstab that mounts root from a hard drive. So, I modify the initrd /etc/fstab file to mount root from /dev/ram0.

Here is a script I used to extract initrd.img from FiwixOS-3.2-initrd-i386.img:

#!/usr/bin/env bash
set -euo pipefail

rm -rf BUILD
mkdir BUILD
cd BUILD

IMG=FiwixOS-3.2-initrd-i386.img

cp ../$IMG .
mkdir mnt
sudo mount -t ext2 $IMG mnt
cp mnt/boot/initrd.img ..
sudo umount mnt

cd ..
rm -rf BUILD

Here is a script I used to create initrd-${SIZE}.img:

#!/usr/bin/env bash
set -euo pipefail

SIZE=$1
IMGSRC=initrd.img
IMGDST=initrd-${SIZE}.img

# Prepare mounted source image
mkdir mnt.src
sudo mount -t ext2 -o loop,rw ${IMGSRC} mnt.src

# Prepare mounted destination image
qemu-img create ${IMGDST} ${SIZE}
# block size 1024, num inodes 512, revision 0, label "FIWIX"
mkfs.ext2 -b 1024 -N 512 -r 0 -L "FIWIX" ${IMGDST}
mkdir mnt.dst
sudo mount -t ext2 -o loop,rw ${IMGDST} mnt.dst

# Use tar to copy and restore device nodes
(cd mnt.src;sudo tar -c -z -p -f ../src.tgz .)
(cd mnt.dst;sudo tar xzpf ../src.tgz)
rm -f src.tgz

# Change initrd to mount itself as root rather than a separate hard drive
cat << "EOF" > fstab
/dev/ram0       /       ext2    defaults        1 1
none            /proc   proc    defaults,noauto 0 0
EOF
sudo mv fstab mnt.dst/etc/fstab

sudo umount mnt.src
rmdir mnt.src
sudo umount mnt.dst
rmdir mnt.dst

Example usage:

./create-initrd-sized.sh 2048K

Note these scripts use sudo so if sudo is not setup properly then you may need to run the script as root using su or another method.

mikaku commented 1 year ago

An initrd of 1700K or larger will push low kernel memory over the 4MB mark, but the memory over 4MB is not mapped by setup_minmem so the kernel crashes when trying to access it.

Yes, this is because the kernel is initially setup to accept only an initrd image of 1.44MB (a floppy disk). If anyone wants a bigger initrd image, then it will have to modify the setup_minmem() function, as you well said. I'm agree this is uncomfortable.

At least #1 should be fixed. This can be done by determining how big an initrd can be accommodated and checking that it can fit on startup. If it cannot fit then display an error message.

Agreed.

In my PR I have stopped using ramdisksize to control the size of the initrd ram drive. Using the ramdisksize parameter for this purpose is unnecessary and can lead to either an error (if too small) or wasted memory (if too large). Instead, the ram drive is sized to match the size of the provided initrd file system image. This allows a small initrd to be combined with a large ram drive (or the reverse). If ramdisksize controlled the initrd size then ramdisksize could not exceed 120MB as previously explained, and so all ram drives would be need to be limited to 120MB.

I can't think of a reason to give your initrd ram drive a size other than the size of the initrd file system anyway. In general, requiring all ram drives, including initrd, to be the same size is inflexible and while this change does not resolve that completely, it is a significant improvement, in my opinion.

I totally agree on this.

The forthcoming PR will allow mapping the initrd ram disk into a much larger memory area. However, at some point the initrd module may bump into another problem: it will start to conflict with memory mapped by user processes. User processes typically start their mapping at 0x08048000. If the initrd is large enough, its mapping will overlap process memory mappings. So, without changing the kernel initrd memory mapping design, an initrd must be limited to approximately 120MB. But this is still much better than 1.6MB.

Actually virtual memory solves this problem. You are able to set a RAMdisk drive of 512MB without any problem, as long as there is enough physical memory, of course. This is because the memory used for RAMdisk drives is mapped in kernel space, and so there is no conflict at all with the virtual memory used by the user-space applications. In fact, this is one of the reasons I removed the option RAMDISK_MAXSIZE recently.

See the following example:

$ qemu-system-i386 \
        -cdrom FiwixOS-3.2-i386.iso \
        -kernel fiwix \
        -append "ro root=/dev/hdc rootfstype=iso9660 ramdisksize=512000" \
        -m 1024

This screen shot shows the kernel messages during the boot up:

Screenshot from 2023-06-08 17-06-24

As you can see the 500MB RAMdisk drive is mapped in 0xc034c000:

ram0      0xc034c000-0xdf74c000 RAMdisk of 512000KB size, 1KB blocksize

and the system boots finely. But of course, there will be only 500MB (aprox.) left for the user:

Screenshot from 2023-06-08 17-15-19

@rick-masters: In all, there are parts of this PR that I agree but there are others that might be unnecessary. I think we could make a little rewrite of this PR removing those unnecessary parts.

If you want me to modify your PR let me say you that I have never modified the code of a PR before merging it. So I don't know if I can edit it here on GitHub or I need to do something locally. I'm a bit lost here, sorry.

rick-masters commented 1 year ago

I'm currently able to set a RAMdisk drive of 512MB without any problem

You're running a different test again :)

I think if you run the tests using an 512MB initrd ram drive as I have described (not a plain ram drive), you'll see that the kernel will crash on boot. The problem is mapping memory for a large initrd module.

The setup_minmem function maps both low memory addresses and the high 0xC0000000 based virtual addresses to the physical memory:

https://github.com/mikaku/Fiwix/blob/ab1512eeff44584d38381302ce558209118ede96/mm/memory.c#L94-L95

So, if setup_minmem/setup_more_minmem is used to map memory for a large enough initrd module it could map low memory virtual addresses which then conflict with users processes that map 0x08048000.

The other ram disks don't have the same problem because are they mapped with map_kaddr after _last_data_addr has been converted to a high virtual address here: https://github.com/mikaku/Fiwix/blob/ab1512eeff44584d38381302ce558209118ede96/mm/memory.c#L313-L314

map_kaddr only maps using _last_data_addr, so those high virtual addresses won't conflict with user processes mappings.

Perhaps the answer is that setup_more_minmem in my PR should not map the low address space like setup_minmem does? I think setup_minmem is mapping the low address space because the initial (tmp_gdt) GDT uses an 0x40000000 offset to shift 0xC0000000 based kernel addresses into the low 0x00000000 based address space, so the low address space needs to be mapped until the definitive GDT is loaded. And once the GDT is loaded there are also low memory structures such as the multiboot info structure that may need to be accessed?

So, I can see why some portion of the low address space may need to be mapped but maybe I'm trying to map too much. I'm still thinking my approach is on the right track overall but maybe I can support larger initrd ram drives with some more work.

rick-masters commented 1 year ago

First, a minor correction: The address of regular ram drives are not assigned using map_kaddr, they are assigned directly from _last_data_addr. But my point stands: they use are high virtual addresses that won't create conflicts with process virtual addresses.

Ok, I looked into my last question: can I avoid mapping low addresses other than the first 4MB that setup_minmem maps? I'm still looking into it. Right now page tables are placed in low memory after the initrd module and the page tables are populated via low addresses so there needs to be mappings to access them. Perhaps they could be accessed using high virtual addresses...

mikaku commented 1 year ago

Oh yes, probably I misunderstood your message.

Anyway, since RAMdisk drives are mapped in the kernel space memory one can have really big RAMdisk drives. If you really need big initrd files, I think that we could explore the possibility to copy the initrd from the low memory to the kernel-space memory. This would permit to have initrd drives as big as RAMdisk drives.

And if this success and you still need even bigger initrd files, we could introduce the support to split the virtual memory to 2GB/2GB along with 3GB/1GB, so you might have an initrd file of up to 2GB of size.

Besides all this:

rick-masters commented 1 year ago

Regarding the ultimate size needed for initrd, live-bootstrap currently requires an 1152MB ram drive: https://github.com/fosslinux/live-bootstrap/blob/e3a2ca1b3f915cc2f97229944a84bfb1aa16794c/sysa/lwext4-1.0.0-lb1/files/make_fiwix_initrd.c#L51

Supporting such a large drive requires an additional and more radical change which I made as a separate change: https://github.com/rick-masters/Fiwix/commit/747a76335a30d41ada7a7803f5ef917068199843

The second phase above to support a huge initrd does indeed relocate the initrd image but not into kernel memory address space because of the 1GB limitation you alluded to. It uses a memory range that Fiwix currently doesn't use (starting at 0x60000000).

I separated out the second change because I was doubtful that you would accept the second change above because it is a bit of a hack to use a special address space (rather than changing the kernel virtual memory to 2/2 or 3/1 and using kernel virtual memory like the other ram drives which you appear to be suggesting). Moreover, the live-bootstrap folks have suggested possibly moving away from using a Fiwix ram drive and using a hard drive instead. (Remember all the work we did for that?). So it wasn't clear if the second phase would even be needed to support an 1152MB initrd.

The change in this PR is simpler and less radical than what would be needed for a 1152MB initrd so I thought you would be more likely to accept it and maybe that would be all that was needed if live-bootstrap moved away from using ram drives. But that issue isn't settled: https://github.com/fosslinux/live-bootstrap/issues/292#issuecomment-1551949315 So, I'm still interested in trying to make a 1152MB initrd work.

And if this success and you still need even bigger initrd files, we could introduce the support to split the virtual memory to 2GB/2GB along with 3GB/1GB, so you might have an initrd file of up to 2GB of size.

I think this is possible but I didn't take this approach because I didn't want to change your kernel memory strategy and frankly, I wasn't sure I knew how to do that. Reworking my fix this way could take a lot more effort. I'm not saying I don't want to do it - just that you may not hear from me for a while if I attempt this direction. The advantage is that I might be able to support a 1152MB initrd with only one change if this more adventurous approach is taken.

What is the purpose of the change in kernel/boot.S and kernel/main.c?

Please read my explanation about the problem with the location of the kernel stack in my first comment.

What is the purpose of the change in mm/page.c?

This is necessary to reserve the kernel stack page so it is not allocated for other purposes.

I think setup_more_minmem() could be merge with setup_minmem() by passing a parameter? what do you think?

I'll look into this. They are similar.

What mean these commentaries in include/fiwix/kernel.h:

This is how much memory the kernel allocates beyond the initrd module for page tables and this quantity is used by setup_more_minmem to map enough memory to hold said page tables.

mikaku commented 1 year ago

Sincerely, if your final goal is to have an initrd image bigger than 1GB, then the best approach is forget this PR and focus on the 2GB/2GB split. The current user space programs won't notice any difference because mmap() addresses start at 0x40000000, so there will be still a margin of 1GB before collide with the stack (which would start at 0x7FFFFFFF).

Just for curiosity, I've just made small adjustments in the kernel and it works (look at the memory stats):

Screenshot from 2023-06-09 19-05-45

Supporting such a large drive requires an additional and more radical change which I made as a separate change:

So, I wouldn't like the idea of include specific hacks for a specific projects, like bootstrap or whatever. Instead, I prefer to include generic features that will be widely usable for anything, and at the same time keep the kernel code simple and clean.

What do you think?

PS: Don't get me wrong. If there is a kernel feature needed by the bootstrap project and there is no way to implement it in a generic way, then we can do an exception. What I mean, is that on every case we should try always the generic path first.

rick-masters commented 1 year ago

then the best approach is forget this PR and focus on the 2GB/2GB split.

Thank you. If you could share your changes I'm sure I could use them. The 2GB split will help but it is only a small part of the solution. The current design which puts the kernel stack and page tables after the initrd module is problematic when the initrd is large because the memory may not be mapped. Changing the current design is not easy. The solutions I've written already have ideas that hopefully will be useful.

So, I wouldn't like the idea of include specific hacks...

I'm starting to work on a better solution. I wrote both patches quite a while ago and was mainly focused on getting live-bootstrap working ASAP. I think I understand the issues better since then and I have more time now to circle back and take another look at doing it the right way.

mikaku commented 1 year ago

If you could share your changes I'm sure I could use them. The 2GB split will help but it is only a small part of the solution. The current design which puts the kernel stack and page tables after the initrd module is problematic when the initrd is large because the memory may not be mapped.

Yes, you're right, the location of the stack must be fixed.

OK, I'll work on a solution to support the 2GB/2GB virtual memory split, which will include a new linker script and a new configuration option (something like CONFIG_VM_SPLIT22). It will also include a new location for the kernel stack that won't be affected by the modules loaded by GRUB.

If this works, then we could use the function you developed to relocate the initrd into the kernel space.

mikaku commented 1 year ago

If you apply the following patch you'll have support for the virtual memory split 2GB/2GB by only enabling the new option CONFIG_VM_SPLIT22:

diff --git a/Makefile b/Makefile
index 712c37b..6fc9850 100644
--- a/Makefile
+++ b/Makefile
@@ -6,6 +6,7 @@

 TOPDIR := $(shell if [ "$$PWD" != "" ] ; then echo $$PWD ; else pwd ; fi)
 INCLUDE = $(TOPDIR)/include
+TMPFILE := $(shell mktemp)

 ARCH = -m32
 CPU = -march=i386
@@ -13,6 +14,7 @@ LANG = -std=c89

 CC = $(CROSS_COMPILE)gcc $(ARCH) $(CPU) $(LANG) -D__KERNEL__ #-D__DEBUG__
 LD = $(CROSS_COMPILE)ld
+CPP = $(CROSS_COMPILE)cpp -P -I$(INCLUDE)
 LIBGCC := $(shell dirname `$(CC) -print-libgcc-file-name`)

 CFLAGS = -I$(INCLUDE) -O2 -fno-pie -fno-common -ffreestanding -Wall -Wstrict-prototypes #-Wextra -Wno-unused-parameter
@@ -48,7 +50,9 @@ export CC LD CFLAGS LDFLAGS INCLUDE
 all:
    @echo "#define UTS_VERSION \"`date`\"" > include/fiwix/version.h
    @for n in $(DIRS) ; do (cd $$n ; $(MAKE)) || exit ; done
-   $(LD) -N -T fiwix.ld $(LDFLAGS) $(OBJS) -L$(LIBGCC) -lgcc -o fiwix
+   $(CPP) fiwix.ld > $(TMPFILE)
+   $(LD) -N -T $(TMPFILE) $(LDFLAGS) $(OBJS) -L$(LIBGCC) -lgcc -o fiwix
+   rm -f $(TMPFILE)
    nm fiwix | sort | gzip -9c > System.map.gz

 clean:
diff --git a/fiwix.ld b/fiwix.ld
index 3f0b4eb..1926864 100644
--- a/fiwix.ld
+++ b/fiwix.ld
@@ -1,15 +1,17 @@
 /*
- * Linker script for the Fiwix kernel (3GB user / 1GB kernel).
+ * Linker script for the Fiwix kernel.
  *
  * Copyright 2018, Jordi Sanfeliu. All rights reserved.
  * Distributed under the terms of the Fiwix License.
  */

+#include <fiwix/linker.h>
+
 OUTPUT_FORMAT("elf32-i386")
-OUTPUT_ARCH(i386)
+OUTPUT_ARCH("i386")
 ENTRY(start)       /* entry point */
-vaddr = 0xC0000000;    /* virtual base address at 3GB */
-paddr = 0x100000;  /* physical address at 1MB */
+vaddr = PAGE_OFFSET;   /* virtual start address */
+paddr = KERNEL_ADDR;   /* physical address at 1MB */

 /* define output sections */
 SECTIONS
diff --git a/include/fiwix/config.h b/include/fiwix/config.h
index 7941541..1514732 100644
--- a/include/fiwix/config.h
+++ b/include/fiwix/config.h
@@ -42,6 +42,7 @@
 #define CONFIG_BGA
 #undef CONFIG_KEXEC
 #define CONFIG_OFFSET64
+#undef CONFIG_VM_SPLIT22

 /* configuration options to help debugging */
diff --git a/include/fiwix/mm.h b/include/fiwix/mm.h
index 7b38089..fb4cf72 100644
--- a/include/fiwix/mm.h
+++ b/include/fiwix/mm.h
@@ -12,10 +12,7 @@
 #include <fiwix/segments.h>
 #include <fiwix/process.h>

-/*
- * Convert only from physical to virtual the addresses below PAGE_OFFSET
- * (formerly 0xC0000000).
- */
+/* convert from physical to virtual the addresses below PAGE_OFFSET only */
 #define P2V(addr)      (addr < PAGE_OFFSET ? addr + PAGE_OFFSET : addr)

 #define V2P(addr)      (addr - PAGE_OFFSET)
diff --git a/include/fiwix/segments.h b/include/fiwix/segments.h
index 5115eb6..0868735 100644
--- a/include/fiwix/segments.h
+++ b/include/fiwix/segments.h
@@ -8,8 +8,7 @@
 #ifndef _FIWIX_SEGMENTS_H
 #define _FIWIX_SEGMENTS_H

-#define PAGE_OFFSET    0xC0000000
-#define KERNEL_ADDR    0x100000
+#include <fiwix/linker.h>

 #define KERNEL_CS  0x08    /* kernel code segment */
 #define KERNEL_DS  0x10    /* kernel data segment */
diff --git a/kernel/boot.S b/kernel/boot.S
index 49484b6..d2a44f4 100644
--- a/kernel/boot.S
+++ b/kernel/boot.S
@@ -40,7 +40,7 @@ tmp_gdt:
    .byte   0x00        /* base address 23-16 */
    .byte   0x9A        /* P=1 DPL=00 S=1 TYPE=1010 (exec/read) */
    .byte   0xCF        /* G=1 DB=1 0=0 AVL=0 SEGLIM=1111 */
-   .byte   0x40        /* base address 31-24 */
+   .byte   GDT_BASE >> 24      /* base address 31-24 */

    /* KERNEL DATA */
    .word   0xFFFF      /* segment limit 15-00 */
@@ -48,7 +48,7 @@ tmp_gdt:
    .byte   0x00        /* base address 23-16 */
    .byte   0x92        /* P=1 DPL=00 S=1 TYPE=0010 (read/write) */
    .byte   0xCF        /* G=1 DB=1 0=0 AVL=0 SEGLIM=1111 */
-   .byte   0x40        /* base address 31-24 */
+   .byte   GDT_BASE >> 24      /* base address 31-24 */

 .align 4
@@ -114,7 +114,7 @@ multiboot_header:           /* multiboot header */

 .align 4
 .globl setup_kernel; setup_kernel:
-   movl    $0xC0010000, %esp   /* default stack address */
+   movl    $PAGE_OFFSET + 0x10000, %esp    /* default stack address */
    pushl   $0          /* reset EFLAGS */
    popf

diff --git a/mm/bios_map.c b/mm/bios_map.c
index 256454e..0519a0e 100644
--- a/mm/bios_map.c
+++ b/mm/bios_map.c
@@ -143,7 +143,7 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
            bmmap = (struct multiboot_mmap_entry *)((unsigned int)bmmap + bmmap->size + sizeof(bmmap->size));
        }
        kstat.physical_pages += (1024 >> 2);    /* add the first MB as a whole */
-       if(kstat.physical_pages > (0x40000000 >> PAGE_SHIFT)) {
+       if(kstat.physical_pages > (GDT_BASE >> PAGE_SHIFT)) {
            printk("WARNING: detected a total of %dMB of available memory below 4GB.\n", (kstat.physical_pages << 2) / 1024);
        }
    } else {
@@ -161,8 +161,8 @@ void bios_map_init(struct multiboot_mmap_entry *bmmap_addr, unsigned int bmmap_l
     * This truncates to 1GB since it's the maximum physical memory
     * currently supported.
     */
-   if(kstat.physical_pages > (0x40000000 >> PAGE_SHIFT)) {
-       kstat.physical_pages = (0x40000000 >> PAGE_SHIFT);
-       printk("WARNING: only up to 1GB of physical memory will be used.\n");
+   if(kstat.physical_pages > (GDT_BASE >> PAGE_SHIFT)) {
+       kstat.physical_pages = (GDT_BASE >> PAGE_SHIFT);
+       printk("WARNING: only up to %dGB of physical memory will be used.\n", GDT_BASE >> 30);
    }
 }
diff --git a/mm/memory.c b/mm/memory.c
index 0b7941d..2171a2f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -91,11 +91,11 @@ unsigned int setup_minmem(void)
        pgtbl[n] = (n << PAGE_SHIFT) | PAGE_PRESENT | PAGE_RW;
        if(!(n % 1024)) {
            pd = n / 1024;
-           kpage_dir[pd] = (unsigned int)(addr + (PAGE_SIZE * pd) + 0x40000000) | PAGE_PRESENT | PAGE_RW;
-           kpage_dir[GET_PGDIR(PAGE_OFFSET) + pd] = (unsigned int)(addr + (PAGE_SIZE * pd) + 0x40000000) | PAGE_PRESENT | PAGE_RW;
+           kpage_dir[pd] = (unsigned int)(addr + (PAGE_SIZE * pd) + GDT_BASE) | PAGE_PRESENT | PAGE_RW;
+           kpage_dir[GET_PGDIR(PAGE_OFFSET) + pd] = (unsigned int)(addr + (PAGE_SIZE * pd) + GDT_BASE) | PAGE_PRESENT | PAGE_RW;
        }
    }
-   return (unsigned int)kpage_dir + 0x40000000;
+   return (unsigned int)kpage_dir + GDT_BASE;
 }

 /* returns the mapped address of a virtual address */

UPDATE: (I forgot to include the new file include/fiwix/linker.h:

/*
 * fiwix/include/fiwix/linker.h
 *
 * Copyright 2023, Jordi Sanfeliu. All rights reserved.
 * Distributed under the terms of the Fiwix License.
 */

#ifndef _FIWIX_LINKER_H
#define _FIWIX_LINKER_H

#include <fiwix/config.h>

#ifdef CONFIG_VM_SPLIT22
#define PAGE_OFFSET 0x80000000  /* VM split: 2GB user / 2GB kernel */
#else
#define PAGE_OFFSET 0xC0000000  /* VM split: 3GB user / 1GB kernel */
#endif /* CONFIG_VM_SPLIT22 */

#define KERNEL_ADDR 0x100000
#define GDT_BASE    (0xFFFFFFFF - (PAGE_OFFSET - 1))

#endif /* _FIWIX_LINKER_H */

In the next hours I'll focus to change the location of the kernel stack and put it right after the BSS section. That is, right before the modules loaded by GRUB which includes the initrd image (as you can see in docs/kmem_layout.txt):

https://github.com/mikaku/Fiwix/blob/ab1512eeff44584d38381302ce558209118ede96/docs/kmem_layout.txt#L17-L31

mikaku commented 1 year ago

With the last commit the kernel stack is now relocated after the BSS section, that is, right before the initrd image loaded by GRUB.

~Let me know if you already have a working function to relocate the initrd to one of the addresses in _last_data_addr. Thanks.~

mikaku commented 1 year ago

Now that we have relocated the kernel stack, things are located like this:

... [bss]+[null]+[kstack]+[initrd]+[kpage_dir]+[kpage_tables]+ ... [internal structures] .. + [RAMdisk drives] ...

With this configuration, setup_minmem() must be aware of the size of [initrd], otherwise the Page Directory would be outside the limits of the temporary Page Directory defined by setup_minmem()(4MB). This is problematic and poorly scalable.

I'm working to relocate the initrd image before setting the kernel Page Directory in mem_init(). This will need a change in the function setup_minmen() to be able to define a kpage_dir according with the current physical memory and also, a change in multiboot() to move the initrd image beyond the the size of kpage_dir and kpage_tables. Then the layout would be:

... [bss]+[null]+[kstack]+[kpage_dir]+[kpage_tables]+[initrd]+[RAMdisk drives]+ ... [internal structures] ...

This should permit big initrd images without changing a single line in the code.

In my PR I have stopped using ramdisksize to control the size of the initrd ram drive. Using the ramdisksize parameter for this purpose is unnecessary and can lead to either an error (if too small) or wasted memory (if too large). Instead, the ram drive is sized to match the size of the provided initrd file system image. This allows a small initrd to be combined with a large ram drive (or the reverse).

Yes, I agree with this. I will also introduce this feature.

rick-masters commented 1 year ago

I really appreciate the work you are doing on this. I was worried that large initrd would end up as a fork but it looks like you may end up doing it in a better way. Let me know when you've got more to look at. I've been traveling so I haven't been able to help recently but I should have time soon.

mikaku commented 1 year ago

Thank you.

Let me explain you where we are right now:

As you already know, the setup_minmem() function maps the first 4MB of memory using a Page Directory and Page Tables that reside in the low memory (from 0 to 640KB), this limits to 600KB approximately the space to map memory (since there are portions of this memory range that is being used by GRUB and BIOS, you know). This means that the biggest initrd image size would be around 600MB, more or less.

Since we need 1KB to map 1MB, we need 1MB to map 1GB. So for big initrd images we cannot have enough low memory to cover the memory where kpage_dir and kpage_tables will reside. At the moment I've abandoned the idea to move the initrd image to a different location because it seemed to me a more complicated path.

After the recent changes (80e2a0cb3fcecbc46bf473b426d0d2388de59efe and d164d400272cd92f129d0dc153a757ad8c630c39) the current layout looks like this:

... [bss]+[null]+[kstack]+[initrd]+[kpage_dir]+[kpage_tables]+ ... [internal structures] .. + [RAMdisk drives] ...

As you already noticed, depending on the size of the initrd image kpage_dir and kpage_tables might reside beyond those 4MB set by setup_minmem(), and so it will GPF as soon as we touch them. Incrementing the size of the memory mapped in setup_minmem() is not an option because: 1) we don't know yet how much memory the system has and 2) we don't know yet the size of the initrd image. So we have a chicken and egg problem here.

My idea is:

~These values remain unknown until the multiboot() function. So it's in that function where we can calculate where kpage_dir and kpage_tables will reside depending on the size of the initrd image (if there is one). Once we know these values, we need a way to update the current 4MB Page Tables and map the memory that kpage_dir and kpage_tables will use. So when later in mem_init() we setup the definitive kpage_dir and kpage_tables the kernel won't crash.~

I did it different. I've modified the function setup_minmem() to place the Page Directory and Page Tables directly to the end of physical memory, so they will never be clobbered by the initrd image. It works, but as you said, now it emerges a new problem. The maximum initrd image size is around 120MB, more than this it starts conflicting with the memory mapped by user processes at 0x08048000. So, initrd images will need indeed a relocation to beyond of PAGE_OFFSET.

I'll clean the code and I'll commit the patch that covers up to ~120MB of initrd images. Then with that first phase working fine, we can start talking about initrd image relocation.

mikaku commented 1 year ago

After applying the previous changes (7ef37e9ee14d06306351ac0864dfa6387695f681 and e407d47e2fa735a6f9f54e75bd6d98bfabbed0be), I realized that adding support for large initrd images (> 128MB) looks really simple. Well, at least it seemed to work in all tests I did (500MB, 800MB and 1500MB initrd images).

To make sure, please use the following kernel which is already compiled with the new option CONFIG_VM_SPLIT22 enabled, so you can safely configure QEMU with 2GB of memory:

fiwix.gz (MD5 uncompressed: f225744d2eb69db7b38cce4b4ea7a342)

mikaku commented 1 year ago

@rick-masters, let me know your results after testing the new support for large initrd images.

rick-masters commented 1 year ago

Sorry for the delay. I'm trying to test with live-bootstrap. It will take more work on my part to figure out how to make your changes to linking and _kstack work with the tcc compiler. I also took a break for a while. I'm back to working on it now.

mikaku commented 1 year ago

No problem, I figured you were out on vacation. Welcome back! :+1:

rick-masters commented 1 year ago

I'm very pleased to say I have been able to run live-bootstrap with the 2/2 split and an 1152MB ram drive. This means I can retire two of the most difficult patches I was maintaining for Fiwix. This is really important progress. Congratulations on making this work! You can close this issue as far as I am concerned.

mikaku commented 1 year ago

Excellent, this is the news I was waiting for! Thank you very much.