zephyriot / zephyr-issues

0 stars 0 forks source link

memory corruption in microkernel memory pool defrag() #497

Closed nashif closed 8 years ago

nashif commented 8 years ago

Reported by Andrew Boie:

So far I have only reproduced this on Nios II, but read on as I have convinced myself this isn't specific to that arch and this memory corruption is happening on all arches, just not in a way that gets test_pool to crash. I only found it after I rearranged how data is organized in my executable and some critical device data got corrupted, resulting in a crash of test_pool.

The creation of memory pools results in sysgen creating some data structures:

struct block_stat blockstatus_0x00010000_0[1];
struct block_stat blockstatus_0x00010000_1[1];
struct block_stat blockstatus_0x00010000_2[4];
struct block_stat blockstatus_0x00010000_3[16];

struct pool_block fragtab_0x00010000[4] =
{
    { 4096, 1, blockstatus_0x00010000_0},
    { 1024, 1, blockstatus_0x00010000_1},
    { 256, 4, blockstatus_0x00010000_2},
    { 64, 16, blockstatus_0x00010000_3},
};

char __noinit __POOL_ID_buffer[4096];

....

struct pool_struct _k_mem_pool_list[2] =
{
    {4096, 64, 2, 4096, 1, 4, NULL, fragtab_0x00010000, __POOL_ID_buffer},
    {1024, 16, 2, 5120, 5, 4, NULL, fragtab_0x00010001, __SECOND_POOL_ID_buffer},
};

So far so good. Notice that the first two of the generated blockstatus arrays are of size 1. Here's the nearby memory:

00400ebc B blockstatus_0x00010000_0
00400ec4 B blockstatus_0x00010000_1
00400ecc b evidence
00400ed0 b uart_console_dev
00400ed4 b accumulated_cycle_count
00400ed8 b slice_count

What I found was a crash in printk(), the device pointer in uart_console_dev was unexpectedly null. I set a watch on it and found this:

Hardware watchpoint 2: uart_console_dev

Old value = (struct device *) 0x4001d0 <__device_uart_altera_jtag_0>
New value = (struct device *) 0x0
0x004192bc in defrag (P=0x400e38 <_k_mem_pool_list>, ifraglevel_start=3, ifraglevel_stop=0)
    at /home/apboie/projects/zephyr/kernel/microkernel/k_memory_pool.c:165
165                                     P->frag_tab[j].blocktable[k].mem_status = 0;

At this point j and k are both 1.

P->frag_tab[j].blocktable is the blockstatus_0x00010000_1 array. Since this array is of size 1, an index of 1 is past its boundary and that's why my uart console struct was getting clobbered.

Looking at the code for defrag, it seems that it simply does not handle well blockstatus arrays less than size 2. Here is the code:

/**
 *
 * @brief Defragmentation algorithm for memory pool
 *
 * @return N/A
 */
static void defrag(struct pool_struct *P,
                                  int ifraglevel_start,
                                  int ifraglevel_stop)
{
        int i, j, k;

        j = ifraglevel_start;

        while (j > ifraglevel_stop) {
                i = 0;
                while (P->frag_tab[j].blocktable[i].mem_blocks != NULL) {
                        if ((P->frag_tab[j].blocktable[i].mem_status & 0xF) ==
                            0) { /* blocks for defragmenting */
                                search_bp(
                                        P->frag_tab[j].blocktable[i].mem_blocks,
                                        P,
                                        j - 1);

                                /* remove blocks & compact list */
                                k = i;
                                while ((P->frag_tab[j]
                                                .blocktable[k]
                                                .mem_blocks != NULL) &&
                                       (k <
                                        (P->frag_tab[j].nr_of_entries - 1))) {
                                        P->frag_tab[j]
                                                .blocktable[k]
                                                .mem_blocks =
                                                P->frag_tab[j]
                                                        .blocktable[k + 1]
                                                        .mem_blocks;
                                        P->frag_tab[j]
                                                .blocktable[k]
                                                .mem_status =
                                                P->frag_tab[j]
                                                        .blocktable[k + 1]
                                                        .mem_status;
                                        k++;
                                }
                                P->frag_tab[j].blocktable[k].mem_blocks = NULL;
                                P->frag_tab[j].blocktable[k].mem_status = 0;
                        } else {
                                i++; /* take next block */
                        }
                }
                j--;
        }
}

If the number of entries is 1, the inner while loop exits with k being 1, and then the memory corruption happens. In addition, the inner while loop iterates while k < the number of entries...and then does writes to entry index k+1. This whole thing looks horribly wrong.

This code is very old and there is no git history for it. Anyone deeply familiar with the algorithm that could provide any suggestions? This should be reproducible on any arch.

Andrew

(Imported from Jira ZEP-514)

nashif commented 8 years ago

by Andrew Boie:

I've sent a patch to disable test_pool until this is fixed. https://gerrit.zephyrproject.org/r/#/c/2867/ However that is not a workaround, any application using memory pool objects may have memory corruption problems. I think this is a pretty serious bug. I know very little about how the memory pool defrag algorithm works.

nashif commented 8 years ago

by Sharron LIU:

I once met defrag issue when I was writing BAT test on microkernel. I disabled my test case also, as I didn't get a time to locate if it is my test case issue. I also think it's serious issue.

nashif commented 8 years ago

by Sharron LIU:

Reporter please verify this.

nashif commented 7 years ago

by Mark Linkmeyer:

Correcting the priority field