Closed vvaltchev closed 3 days ago
@vvaltchev
In free_process_int()
have the following code, the execution kfree_obj((void *)get_process_task(pi), struct task_and_process)
will cause the value of pi->debug_cmdline
to change to the poisoned value!!! Of course, pi
should not be free before free pi->debug_cmdline
.
if (release_obj(pi) == 0) {
arch_specific_free_proc(pi);
kfree_obj((void *)get_process_task(pi), struct task_and_process); ===========> Execution here will cause the value of `pi->debug_cmdline` to change to the poisoned value!
if (MOD_debugpanel)
kfree2(pi->debug_cmdline, PROCESS_CMDLINE_BUF_SIZE);
}
So the solution is quite simple, Just change it as following, (and I won't submit the PR myself)
if (release_obj(pi) == 0) {
if (MOD_debugpanel)
kfree2(pi->debug_cmdline, PROCESS_CMDLINE_BUF_SIZE);
arch_specific_free_proc(pi);
kfree_obj((void *)get_process_task(pi), struct task_and_process);
}
Thank you @aenrbes. I completely agree that's a bug. I'll try the fix you suggested later today after my work time.
However, I fail to understand how we don't hit that on i386 too all the time: all the CI builds use the free kmalloc poisoning and MOD_debugpanel is compiled in. We should be hitting this 100% of time because free_process_int()
is generic code, while we never did. Something must be different on RISCV64, right? Do you have any thoughts?
@vvaltchev
Since KMALLOC_FREE_MEM_POISONING will only work in main_heaps_kfree()
, free_process_int()
will only execute to small_heaps_kfree()
in most cases, main_heaps_kfree()
is not executed at all. In other words, in most cases, KMALLOC_FREE_MEM_POISONING didn't work at all; I guess because riscv64 is 64-bit, main_heaps_kfree()
will be executed in some cases.
Thank you @aenrbes ! That makes absolutely sense. Two thoughts, FYI:
I should make KMALLOC_FREE_MEM_POISONING work also for the small heaps. I don't know why it works only for the "big" heap allocations.
I didn't realize that the process + task struct is so big on RISCV64 that doesn't fit in 2 KB anymore. I'll check their sizes and see what's the best approach. Either make two separate allocations, or make the small heap size to be 4 KB instead.
The size of riscv64 's process + task struct should be less than 2kb, because main_heaps_free() is only triggered on rare occasions, so the problem should be that the 64bit platform runs out of memory in small heaps on rare occasions
Thanks for the hint! Makes sense to me. I'll research that when I have a bit of extra time. If we run out of small heaps, we should have more small heaps. Certainly with 64-bit the memory usage is different due to the larger data structures.
Hello @aenrbes, another problem I've discovered with RISCV64 is that the kernel crashes when all the tests are run and KMALLOC_FREE_MEM_POISONING is enabled. It's very important to test the kernel with both UBSAN and KMALLOC_FREE_MEM_POISONING and make sure it works with all of those extra checks. It would be create to run tests also with an address sanitizer, but Tilck doesn't have support for that, yet.
Can you take a look? The backtrace is entirely in the generic code, but I've never observed that on i386. At the moment I believe the problem could be related with a subtle bug in the RISCV64 process - specific code.
Steps to reproduce:
./build/st/run_all_tests -c -T sh
Fortunately, both this and the other issue (https://github.com/vvaltchev/tilck/issues/227) can be reproduced locally 100% of the time, not just on Azure. Note: if you run only that test (fs_perf1) the problem cannot be reproduced. Therefore, something gets corrupted when all the tests run.
Vlad