jklmnn / base-linux-hw

GNU General Public License v3.0
1 stars 1 forks source link

Enable IO_MEM session #6

Open jklmnn opened 6 years ago

jklmnn commented 6 years ago

Enable IO_MEM session on Genode on Linux. Check how to map memory from Linux user space.

jklmnn commented 6 years ago

It is possible to use io memory mapped from /dev/mem. A demo that writes colors to the efi framebuffer is available at this gist. To run this an UEFi based system is required. Also any graphics driver needs to be blacklisted (otherwise it would constantly overwrite the framebuffer and/or disable the screen completely) and the kernel needs to be compiled without CONFIG_FB_EFI (otherwise efifb will take the required address space and it cant be accessed anymore).

senier commented 6 years ago

Idea for mapping: mmap /dev/mem with an appropriate offset and size. Then, open /proc/self/map_files/[REGION] and pass the open file descriptor to the session client. See proc(5) for details.

jklmnn commented 6 years ago

The current approach is to create a custom device in /dev/ that basically does the same as /dev/mem but is constrainable by using ioctl.

Why can't we constrain /dev/mem? Each device file implements a specific set of operations that can be executed on the file, including open, read, write and also mmap. The mmap implementation of /dev/mem simply takes the arguments of mmap and only uses the file to check for permissions but provides all the mapping without touching the file itself. This would make any constrain on the file descriptor, if it was even possible, useless.

How do we implement the custom device? The device should only require the implementation of open and mmap. Using any other operation such as write or read should result in an invalid argument error. As far as I researched it the struct file * is unique per file descriptor (please tell me if I'm wrong with that). It also contains a private_data field that can be used to store the specific constraints. Open will set this to zero so instead of /dev/mem our device won't provide any access to memory. I think this approach is safer than haveing everything at first and constraining it after. The called ioctl will then set the required ranges which will be checked when calling mmap.

Are there any permission checks on ioctl? If core can call ioctl on the file descriptor and the user process can call mmap why shouldn't it be able to call ioctl itself and reuse the file descriptor? A solution for this could be to make ioctl call only possible once (e.g. by checking if the set range is unequal to zero). I think this would also be semantically correct since we neither can simply move a mapping or change the range in Genodes IO_MEM session. So core calls ioctl once and after this the file descriptor can't be modified anymore.

The implementation of this kernel module is in jklmnn/genode-linux. For debugging purposes it currently contains implementations of read and write that could be removed later.

senier commented 6 years ago

I don't know about struct file *, but I suspect it is unique per fd. AFAIK ioctl() has no permission checks, as this is expected to be done on the file level, just like /dev/mem. Making ioctl() a trap door and checking mmap() parameters against the values stored in private_data sounds like a good idea!

jklmnn commented 6 years ago

I have tested the struct file * and it seems to be unique per open. We cannot clone the file descriptor with dup as both fds are then coupled to each other and share their state. But with open the file multiple times it seems to work. It is important to close the file properly after the mapping to free the private_data memory.

An interesting thing I found out: To test the mmap I reused the efi_fb.cc I used to test /dev/mem with. It seems that VirtualBox provides a Multiboot2 compatible framebuffer at 0xe0000000 since a coloured line appeared on the VirtualBox screen. This happened even though I did not boot the VM with UEFI. Nevertheless this makes testing a bit easier.

EDIT: overview on in kernel file operations: https://www.xml.com/ldd/chapter/book/ch03.html

jklmnn commented 6 years ago

I have a working kernel module now and was able to map and use memory with the new efi_fb.cc. To use an application with the kernel module genode.h needs to be included to get the relevant data type and ioctl command. This file also include invalid and double ioctl/mmap calls to check the boundary check and trap (this is not a complete test).

senier commented 6 years ago

Cool! How can a physical memory region be unmapped when not needed anymore or a driver is restarted?

jklmnn commented 6 years ago

Actually I haven't thought about unmapping yet. I suspect the unmapping happens automatically when the process exits. While I currently have no proof for this two things indicate this:

Assuming these assumptions are wrong and the memory needs to be unmapped this should be done from the user space process with munmap as it is done with mmap.

senier commented 6 years ago

You definitely should validate whether the effect of remap_pfn_range() is reversed on process exit. I also guess this is the case. Nonetheless, for sake of completeness, munmap should be supported to allow processes to give up devices resources during runtime.

I wonder whether remap_pfn_range() is the right call to map devices memory for which an IOMMU policy needs to be established. Sure you don't need io_remap_pfn_range()? And if so, what's the difference and how do we distinguish what to use (e.g. for physical RAM)?

jklmnn commented 6 years ago

According to the man page of mmap, we don't need to unmap the memory manually if the process is terminated.

The region is also automatically unmapped when the process is terminated. On the other hand, closing the file descriptor does not unmap the region.

So we would only need to call munmap if we want to reuse the range in the same process. This is handled solely by the process calling munmap. There is no special implementation in the module required (since munmap only deletes the mapping created by mmap without any further boundary check). It might be possible to implement a custom munmap but I don't see any value there.

remap_pfn_range is definitely the correct call as it sets the VM_IO flag to the mapping which is exactly what we want to have

VM_IO tells people not to look at these pages (accesses can have side effects).

Beside that io_remap_pfn_range is defined here as

#ifndef io_remap_pfn_range
#define io_remap_pfn_range remap_pfn_range
#endif

The only cases with io_remap_pfn_range being actually declared and implemented were some special config cases for mips and sparc. Also /dev/mem uses remap_pfn_range so I don't think that io_remap_pfn_range gives us any improvement.

jklmnn commented 6 years ago

I renamed the repository to hwiodev (hardware io device) and the module to hwio which is hopefully sufficiently generic ;)

jklmnn commented 6 years ago

I'm now able to build an image that loads the module from the initramfs before Genode gets started. Also /dev is now populated by mounting devtmpfs to it. This could probably be removed also but then the device node needs to be created with mknod which is deprecated and also has other issues (the device id of the node needs to be find out or set statically which provokes collisions between devices, also testing mknod lead to execve failing for whatever reason). Udev is of course no options as it depends on /sys and /proc.

senier commented 6 years ago

The devtmpfs solution should work. Whether it is better to statically place a device not in tmpfs needs to be evaluated when minimizing the kernel. Note, that collisions are not an issue in our case, as we will barely have any other driver enabled. Please also add those design choices to your document.

senier commented 6 years ago

hwiodev is way too specific. Please rename to fooio. ;-)

jklmnn commented 6 years ago

I think I found the source (not the reason unfortunately) of the "blocking canceled in entrypoint constructor" error. It happens in repos/base-linux/src/lib/base/ipc.cc on the recvmsg syscall that gets interrupted by a signal. This only happens if Genode is booted on a bare kernel (running as root didn't cause this problem). I have also tested this for different kernels built on different machines and they all caused the error.

jklmnn commented 6 years ago

In the mean time I can also test the IO_MEM session in a virtual box by running Genode as root. It should then be able to access the file in /dev also.

Since the build should run in a shared folder symlinks for this specific shared folder need to be enabled by adding

<ExtraDataItem name="VBoxInternal2/SharedFoldersEnableSymlinksCreate/SHARE_NAME" value="1"/>

to `~/.config/VirtualBox/VirtualBox.xml. (see https://www.virtualbox.org/ticket/10085#comment:57)

jklmnn commented 6 years ago

I'm finally able to run the framebuffer test in QEMU on Linux! Yet something in the kernel seems to break. I'm not sure what exactly it is.

An ealier fault referenced the x86 Page Attribution Table but that wasn't the real cause as the kernel stated a double fault. Booting with nopat let the PAT warning disappear but did not resolve the fault. Yet some sources stated that PAT related errors occur on QEMU so I think a test on real Hardware would be quite useful to see what is happening. Beside that it was also mentioned that the options supplied to mmap could be a cause, especially in regard to caching. This is also supported by the PAT message. A look to what exactly the mmap call in Genode is using might be useful.

Beside being quite unreliable yet the concept seems to work. The screenshot shows the black and white framebuffer test on the top in QEMU. A second test with nopat worked a little bit better. It also shows the double fault that unfortunately did not mention a cause.

gl_framebuffer

Another test with the same setup did also fail but caused a hopefully more useful error message (I did not read it yet but it's definitely more than just "double fault" ;) ). I have attached it below.

[    1.589633] ------------[ cut here ]------------
[    1.627949] WARNING: CPU: 0 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x632/0x800()
[    1.643111] x86/mm: Found insecure W+X mapping at address ffff880000900000/0xffff880000900000
[    1.681209] Modules linked in:
[    1.692798] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.4.3+ #4
[    1.696461] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    1.696461]  0000000000000286 00000000dd6efb3d ffffffff812b6035 ffff88001e0b3de0
[    1.696461]  ffffffff817eb8a2 ffffffff8105bd51 0000000000000000 ffff88001e0b3e38
[    1.696461]  00000000000001e3 ffffc00000000fff 0000000000000000 ffffffff8105bddc
[    1.696461] Call Trace:
[    1.696461]  [<ffffffff812b6035>] ? dump_stack+0x5c/0x77
[    1.696461]  [<ffffffff8105bd51>] ? warn_slowpath_common+0x81/0xb0
[    1.696461]  [<ffffffff8105bddc>] ? warn_slowpath_fmt+0x5c/0x80
[    1.696461]  [<ffffffff810aa076>] ? vprintk_emit+0x3b6/0x540
[    1.696461]  [<ffffffff81053332>] ? note_page+0x632/0x800
[    1.696461]  [<ffffffff81053787>] ? ptdump_walk_pgd_level_core+0x287/0x410
[    1.696461]  [<ffffffff8152eac0>] ? rest_init+0x80/0x80
[    1.696461]  [<ffffffff8152ead9>] ? kernel_init+0x19/0xe0
[    1.696461]  [<ffffffff81539f4f>] ? ret_from_fork+0x3f/0x70
[    1.696461]  [<ffffffff8152eac0>] ? rest_init+0x80/0x80
[    2.073976] ---[ end trace 5725af1922b8510c ]---
[    2.222207] x86/mm: Checked W+X mappings: FAILED, 14006 W+X pages found.
[    2.232791] tsc: Refined TSC clocksource calibration: 2494.210 MHz
[    2.268858] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23f3dc117a2, max_idle_ns: 440795222026 ns
preparing environment for Genode
[    2.383165] hwio: module verification failed: signature and/or required key missing - tainting kernel
[    2.421158] hwio module registered
loading Genode on Linux
Genode 17.11-124-gcd26cbdac <local changes>
17592186044415 MiB RAM and 8998 caps assigned to init
[    3.294757] clocksource: Switched to clocksource tsc
[init -> test-framebuffer] --- Test framebuffer ---
[init -> fb_boot_drv] Framebuffer with 800x600x32 @ 0x80000000 type=1 pitch=3200
Warning: Should only be used for IOMEM and not within Linux.
Warning: no io_mem support on Linux (args="base=0x80000000, size=0x1d4c00, wc=yes, diag=0, label="init -> fb_boot_drv -> ", ram_quota=516, cap_quota=2")
_fd 120
0
[    3.535110] __func__ vma->vm_pgoff: 0
[    3.566334] __func__ offset: 80000000
[    3.578293] __func__ range: 80000000 x 1921024
[init -> test-framebuffer] framebuffer is 800x600@RGB565
[init -> test-framebuffer] black & white stripes
[    3.776159] swap_dup: Bad swap file entry 38007c7e7c007c78
[    3.791044] swap_dup: Bad swap file entry 38007c7e7c007c79
[    3.829105] swap_dup: Bad swap file entry 38007c7e7c007c7a
[    3.843562] swap_dup: Bad swap file entry 38007c7e7c007c7b
[    3.881776] swap_dup: Bad swap file entry 38007c7e7c007c7c
[    3.894332] swap_dup: Bad swap file entry 38007c7e7c007c7d
[    3.931258] swap_dup: Bad swap file entry 38007c7e7c007c7e
[    3.941453] swap_dup: Bad swap file entry 38007c7e7c007c7f
[    3.951166] swap_dup: Bad swap file entry 38007c7e7c007c7e
[    3.986272] ld.lib.so invoked oom-killer: gfp_mask=0x0, order=0, oom_score_adj=0
[    3.995423] ld.lib.so cpuset=/ mems_allowed=0
[    4.004641] CPU: 0 PID: 53 Comm: ld.lib.so Tainted: G        W  OE   4.4.3+ #4
[    4.008445] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    4.008445]  0000000000000286 00000000df7e5fbc ffffffff812b6035 ffff88001f4044c0
[    4.008445]  0000000000000000 ffffffff811a3632 ffffffff8117017d 00000000df7e5fbc
[    4.008445]  00f8fcf800f8fcf8 00f8fcf800000001 ffff88001f404b38 ffff88001f4e7f30
[    4.008445] Call Trace:
[    4.008445]  [<ffffffff812b6035>] ? dump_stack+0x5c/0x77
[    4.008445]  [<ffffffff811a3632>] ? dump_header.isra.9+0x5f/0x1c9
[    4.008445]  [<ffffffff8117017d>] ? handle_mm_fault+0x151d/0x1690
[    4.008445]  [<ffffffff81144e86>] ? oom_kill_process+0x216/0x3d0
[    4.008445]  [<ffffffff811452aa>] ? out_of_memory+0x20a/0x300
[    4.008445]  [<ffffffff81145415>] ? pagefault_out_of_memory+0x75/0xc0
[    4.008445]  [<ffffffff8153bb08>] ? page_fault+0x28/0x30
[    4.233286] Mem-Info:
[    4.267170] active_anon:2157 inactive_anon:1103 isolated_anon:0
[    4.267170]  active_file:0 inactive_file:0 isolated_file:0
[    4.267170]  unevictable:0 dirty:0 writeback:0 unstable:0
[    4.267170]  slab_reclaimable:1151 slab_unreclaimable:871
[    4.267170]  mapped:979 shmem:1473 pagetables:40 bounce:0
[    4.267170]  free:119767 free_pcp:34 free_cma:0
[    4.374089] DMA free:14640kB min:84kB low:104kB high:124kB active_anon:224kB inactive_anon:56kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15060kB managed:14968kB mlocked:0kB dirty:0kB writeback:0kB mapped:56kB shmem:56kB slab_reclaimable:8kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:8kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[    4.446565] lowmem_reserve[]: 0 472 472 472
[    4.482477] DMA32 free:464428kB min:2736kB low:3420kB high:4104kB active_anon:8404kB inactive_anon:4356kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:506312kB managed:487260kB mlocked:0kB dirty:0kB writeback:0kB mapped:3860kB shmem:5836kB slab_reclaimable:4596kB slab_unreclaimable:3452kB kernel_stack:816kB pagetables:152kB unstable:0kB bounce:0kB free_pcp:136kB local_pcp:136kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[    4.604855] lowmem_reserve[]: 0 0 0 0
[    4.644784] DMA: 0*4kB 2*8kB (UE) 2*16kB (UE) 2*32kB (UE) 3*64kB (UME) 2*128kB (UE) 3*256kB (UME) 2*512kB (ME) 2*1024kB (UE) 3*2048kB (ME) 1*4096kB (M) = 14640kB
[    4.697340] DMA32: 1*4kB (M) 3*8kB (UME) 3*16kB (UME) 1*32kB (M) 1*64kB (E) 1*128kB (E) 3*256kB (UME) 1*512kB (U) 4*1024kB (ME) 8*2048kB (UME) 108*4096kB (M) = 464428kB
[    4.751462] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[    4.766217] 1473 total pagecache pages
[    4.779551] 0 pages in swap cache
[    4.817785] Swap cache stats: add 0, delete 0, find 0/1
[    4.831672] Free swap  = 0kB
[    4.870743] Total swap = 0kB
[    4.884034] 130343 pages RAM
[    4.921840] 0 pages HighMem/MovableOnly
[    4.935550] 4786 pages reserved
[    4.986236] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[    5.001391] [   47]     0    47   107252      520       7       4        0             0 ld.lib.so
[    5.058751] [   50]     0    50   107467      634       6       4        0             0 ld.lib.so
[    5.112518] [   51]     0    51   107943      458       9       4        0             0 ld.lib.so
[    5.164227] [   52]     0    52   107241      447       9       4        0             0 ld.lib.so
[    5.214798] Out of memory: Kill process 50 (ld.lib.so) score 4 or sacrifice child
[    5.230051] Killed process 50 (ld.lib.so) total-vm:429868kB, anon-rss:820kB, file-rss:1716kB
[    5.271483] BUG: unable to handle kernel NULL pointer dereference at 0000000000000820
[    5.273376] IP: [<ffffffff8105e8a8>] do_exit+0x3d8/0xaf0
[    5.273376] PGD 1f4fc067 PUD 1f4fd067 PMD 0 
[    5.273376] Oops: 0000 [#1] SMP 
[    5.273376] Modules linked in: hwio(OE)
[    5.273376] CPU: 0 PID: 50 Comm: ld.lib.so Tainted: G        W  OE   4.4.3+ #4
[    5.273376] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[    5.273376] task: ffff88001f4044c0 ti: ffff88001f4b4000 task.ti: ffff88001f4b4000
[    5.273376] RIP: 0010:[<ffffffff8105e8a8>]  [<ffffffff8105e8a8>] do_exit+0x3d8/0xaf0
[    5.273376] RSP: 0018:ffff88001f4b7d10  EFLAGS: 00000006
[    5.273376] RAX: 0000000000000000 RBX: ffff88001f4044c0 RCX: 0000000000000000
[    5.273376] RDX: 000000001f1f9f00 RSI: 000000000000000b RDI: ffff88001f4044c0
[    5.273376] RBP: ffff88001f4b7d30 R08: 0000000000000000 R09: 0000000000000000
[    5.273376] R10: ffff88001f4044c0 R11: 0000000000000000 R12: ffff88001f4ab440
[    5.273376] R13: ffff88001f45b780 R14: 0000000000000000 R15: ffff88001f45b7e8
[    5.273376] FS:  0000000000000000(0000) GS:ffff88001e400000(0000) knlGS:0000000000000000
[    5.273376] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    5.273376] CR2: 0000000000000820 CR3: 000000001f4a7000 CR4: 00000000000006f0
[    5.273376] Stack:
[    5.273376]  ffff88001f4b7ed8 0000000000000058 00000000a7e6ec24 0000000000000000
[    5.273376]  ffff88001f4b7d30 ffff88001f4b7d30 00000000a7e6ec24 0000000000000009
[    5.273376]  ffff88001e2a8540 ffff88001f4ab440 ffff88001f4b7f58 0000000000000008
[    5.273376] Call Trace:
[    5.273376]  [<ffffffff8105f039>] ? do_group_exit+0x39/0xb0
[    5.273376]  [<ffffffff81069afe>] ? get_signal+0x2be/0x6b0
[    5.273376]  [<ffffffff81005306>] ? do_signal+0x36/0x630
[    5.273376]  [<ffffffff8142b05d>] ? __sys_recvmsg+0x7d/0x90
[    5.273376]  [<ffffffff810022c5>] ? exit_to_usermode_loop+0x85/0xc0
[    5.273376]  [<ffffffff81539d4c>] ? int_ret_from_sys_call+0x25/0x8f
[    5.273376] Code: 48 89 6c 24 20 48 89 6c 24 28 e8 24 af 4d 00 48 8d 83 40 04 00 00 48 39 83 40 04 00 00 0f 85 bb 05 00 00 48 89 df e8 a8 6d 01 00 <4c> 8b b8 20 08 00 00 48 89 c7 4c 39 fb 0f 84 4b 05 00 00 48 8b 
[    5.273376] RIP  [<ffffffff8105e8a8>] do_exit+0x3d8/0xaf0
[    5.273376]  RSP <ffff88001f4b7d10>
[    5.273376] CR2: 0000000000000820
[    5.273376] ---[ end trace 5725af1922b8510d ]---
[    5.273376] Fixing recursive fault but reboot is needed!
jklmnn commented 6 years ago

Apparently it seems to work now, yet I have no idea why. Only the window of the framebuffer seems to be a bit broken but this could also have to do with Linux and QEMU.

gl_framebuffer2

jklmnn commented 6 years ago

I have tested the framebuffer on the X260 which first of all "works". Yet it shows the same behaviour as QEMU and only uses a line at the top of the screen.

jklmnn commented 6 years ago

This was fixed by https://github.com/jklmnn/hwiodev/commit/8ead3351a1be96b21bcb8e5e09580892613af85a.

jklmnn commented 6 years ago

The information required for the platform info is not easily available in the user space on Linux. But also on other kernels it is provided by the kernel itself so I decided to create a platform_info kernel module. This isn't semantically wrong and also gives easier access to the required information. For example the screen_info struct already contains all the required information for the boot framebuffer.

senier commented 6 years ago

Agreed. The interesting question now is how to get the information out of the module. Map or read a device? Custom pseudo file system? Sysfs? ...?

jklmnn commented 6 years ago

Well the module would again create a device called platform_info. It would implement either mmap or read (or both) to access the xml depending on what Genode uses to provide its rom sessions.

senier commented 6 years ago

Wicked! So, does that mean the new kernel module only handles the platform info and ACPI table can be done in userspace?

jklmnn commented 6 years ago

The plan is definitely that the kernel module solely provide the required information. Any mapping/operation of this information will be done by Genode. So only the ACPI table addresses are provided but their mapping and usage should be done by Genodes acpi_drv.

Yet I'm afraid that we will get into some kind of loop hole with ACPI. I suspect that if the Linux kernel knows the ACPI table addresses it will also have mapped the ACPI tables. But if we build it without ACPI support it probably won't have those addresses also. The solution would be to do the ACPI address detection by ourselves in the kernel module. We also have to do it inside the kernel module as we cannot split the platform_info rom.

senier commented 6 years ago

We definitely need to build Linux without ACPI and find the ACPI table addresses manually. How hard can it be? ;-)

jklmnn commented 6 years ago

I how have a working state for the platform_info device. I already tested the same image on QEMU and the X260 and it works on both.

Yet I had to do a really really dirty hack to make it run on Genode. They check the size of a dataspace before they map it and if it is zero they throw an error. Now this check on base-linux is actually the file size check on the inode. This works fine for all their binaries but the platform_info is a special device which always has a file size of zero. The only way to access the inode is the open function (some others would work too, but open is the most convenient).
The solution contains of two parts now:

  1. I implemented a custom open call that checks the length of the platform_info string and sets the file size to that length (https://github.com/jklmnn/gbl-platform_info/commit/f683dd3c6e38a38974cb9c0e88a9eef842442801). This would be sufficient if Genode would first open then check the size. But since they never call open before and we can't change the size without this call it is initially 0 and the ROM session fails.
  2. Forunately we have the init binary that loads the module and therefore can "initialize" the module by calling open on the device (https://github.com/jklmnn/base-linux-hw/commit/8ad1cd6cf9965435f520c0fdbb1961a1c9adf62a). This "fixes" the size check without changing Genode.
  3. This has not been done and is just for completeness. It would also have been possible to check if the dataspace is a regular file or a special device and handle the size determination differently. But first I could not find a generic way to determine the size of a special file and secondly I did not want to do any sophisticated changes on Genodes base-linux ROM dataspace handling.
jklmnn commented 6 years ago

Reference platform_info content on qemu (NOVA):

<platform_info>
    <acpi revision="2" rsdt="0x1fe93074" xsdt="0x1fe930e8"/>
    <boot>
        <framebuffer phys="0x80000000" width="800" height="600" bpp="32" type="1" pitch="3200"/>
    </boot>
</platform_info>

Generally the RSDP that contains the pointers to the Root ACPI tables is located either in the EBDA (Extended BIOS data area) which itself can be found by a pointer located at 0x40e or in the BIOS read only space in 0xe0000 - 0xfffff (see ACPI spec section 5.2.5.1). It is marked by the signature RSD PTR (notice the trailing space).

On UEFI systems the pointer is located in the EFI System table. The old variant is not guaranteed to work on non-UEFI systems.

I still choose to implement the first variant as it doesn't bind us to UEFI and NOVA also uses this and we did not encounter problems with it yet. If we encounter problems with it one day we maybe should file a issue on NOVA too.

senier commented 6 years ago

I guess you meant to say "the old variant is not guaranteed to worn on UEFI systems"?

jklmnn commented 6 years ago

You guessed right.

jklmnn commented 6 years ago

I should have read the https://github.com/genodelabs/genode/issues/2242 issue on Genode first. While the ACPI tables get found with legacy boot correctly they're not found with UEFI. I looked through the issue and Alex did also make this finding. On NOVA on UEFI the ACPI tables are available in the Multiboot2 structure which is not accessible on Linux.

There is an UEFI-specific implementation of the ACPI table detection in Linux which I would reimplement in my kernel module. This could lead to the dependency on some EFI kernel configs but we would need them anyway if we want to boot on UEFI.

jklmnn commented 6 years ago

Implementing the UEFI variant was straight forward and the module now produces the same output as NOVA.

senier commented 6 years ago

Great! Do we have any open issues with the IO_MEM session then? If not, I guess we can close this issue.

jklmnn commented 6 years ago

A little clean up of the branch in Genode is still needed but thats not exactly part of this issue, so we can close this.

jklmnn commented 6 years ago

I found a new issue. We said that we just simply disable ACPI so it won't collide with us. Yet on UEFI the ACPI tables seem to be available only through the EFI system table which itself is passed to the kernel on boot so we can't search the memory. Unfortunately the EFI config in Linux depends on the ACPI config so I can't build a system that gets the EFI system table without ACPI (yet it does boot on EFi, but I think thats more Grub than Linux). One way could be to get the information from Grub somehow.

For the mean time i have enabled Linux to boot with iso so I can continue my testing.

jklmnn commented 6 years ago

Probably the efi tables could also be acquired via the bootparams that contain a struct efi_info which could also hold the desired information. Since this is data that should be provided by the boot loader it could be accessible without using CONFIG_EFI.