Closed nashif closed 8 years ago
by Andrew Boie:
I've sent a patch to disable test_pool until this is fixed. https://gerrit.zephyrproject.org/r/#/c/2867/ However that is not a workaround, any application using memory pool objects may have memory corruption problems. I think this is a pretty serious bug. I know very little about how the memory pool defrag algorithm works.
by Sharron LIU:
I once met defrag issue when I was writing BAT test on microkernel. I disabled my test case also, as I didn't get a time to locate if it is my test case issue. I also think it's serious issue.
by Sharron LIU:
Reporter please verify this.
by Mark Linkmeyer:
Correcting the priority field
Reported by Andrew Boie:
So far I have only reproduced this on Nios II, but read on as I have convinced myself this isn't specific to that arch and this memory corruption is happening on all arches, just not in a way that gets test_pool to crash. I only found it after I rearranged how data is organized in my executable and some critical device data got corrupted, resulting in a crash of test_pool.
The creation of memory pools results in sysgen creating some data structures:
So far so good. Notice that the first two of the generated blockstatus arrays are of size 1. Here's the nearby memory:
What I found was a crash in printk(), the device pointer in uart_console_dev was unexpectedly null. I set a watch on it and found this:
P->frag_tab[j].blocktable is the blockstatus_0x00010000_1 array. Since this array is of size 1, an index of 1 is past its boundary and that's why my uart console struct was getting clobbered.
Looking at the code for defrag, it seems that it simply does not handle well blockstatus arrays less than size 2. Here is the code:
If the number of entries is 1, the inner while loop exits with k being 1, and then the memory corruption happens. In addition, the inner while loop iterates while k < the number of entries...and then does writes to entry index k+1. This whole thing looks horribly wrong.
This code is very old and there is no git history for it. Anyone deeply familiar with the algorithm that could provide any suggestions? This should be reproducible on any arch.
Andrew
(Imported from Jira ZEP-514)