klange / toaruos

A completely-from-scratch hobby operating system: bootloader, kernel, drivers, C library, and userspace including a composited graphical UI, dynamic linker, syntax-highlighting text editor, network stack, etc.
https://toaruos.org/
University of Illinois/NCSA Open Source License
6.09k stars 477 forks source link

Investigate memory corruption when using tarfs #183

Closed klange closed 5 years ago

klange commented 5 years ago

When switching out an ext2 ramdisk for a tar ramdisk, an issue rather consistently shows up when launching the compositor, causing crashes and even complete corruption of the VM environment. Initially, the tarfs driver itself was suspected, but a thorough analysis has cleared it of any wrongdoing - bounds checking and strict limits on copy lengths all checked out.

To start investigating this issue, I built a new set of memory allocation tracking tools. After several revisions and improvements to these tools, I believe the issue is has been narrowed down to a corruption of memory used for kernel stacks, as well as corruption of memory used for file descriptor tables (the latter may be caused by the former, it's hard to tell). The addition of guard pages around kernel stacks suggests that something - possibly page directory management - is touching regions it should not be touching.

klange commented 5 years ago

This was a fun one.

After extensive investigation in GDB and the QEMU monitor, it was discovered that the compositor was getting mappings to low physical memory (where the kernel resides) when it called sbrk. Further investigation showed that the physical frame bitmap had been cleared. This eventually led to the discovery of a bug in the ramdisk freeing code, which was freeing frames it did not full own (and which had been allocated through the placement pointer allocator... to the frame bitmap). ext2 ramdisks were always a multiple of page size. The tarfs ramdisks are only multiples of 512, so there was a 1 in 8 chance everything would be fine.

klange commented 5 years ago
10:49:35 <... klange> So I do this thing on my CDs where I take a read-only 
                      ramdisk and extract it out into my in-memory read-write 
                      tmpfs.
10:49:57 <... klange> Obviously if I have a ~20MB ramdisk, I want the space 
                      that was used to hold that to be available for the 
                      system after the move to the read-write tmpfs.
10:50:14 <... klange> So naturally I clear out the frames it was using so they 
                      can be reclaimed by the pmm.
10:50:33 <... klange> I recently switched to using tarballs for those 
                      ramdisks, from mini ext2 filesystems.
10:50:42 <... klange> The tarballs are easier to create and have less overhead.
10:51:09 <... klange> An interesting property of the ext2 filesystems is that 
                      they were always a multiple of page size, due to block 
                      requirements in ext2.
10:51:16 <... klange> Tarballs are only multiples of 512 bytes.
10:52:19 <... klange> The tarfs ramdisks are the last thing that gets loaded 
                      by the bootloader, and the early placement pointer 
                      allocator starts immediately afterwards.
10:52:36 <... klange> And the next that gets allocated is... the page frame 
                      bitmap.
10:52:44 <... klange> That the pmm uses... for allocations...
10:53:34 <... klange> So everything's fine until the startup says "okay you 
                      can remove the ramdisk now I'm done with it" and the 
                      kernel goes and frees... the frame with the start of the 
                      frame bitmap.
10:54:10 <.. klange> Which then gets reallocated to something stupid like a 
                     bitmap with a font in it.
10:54:29 <.. klange> Which then marks larges swaths of the kernel as available 
                     for the PMM.