micro-ROS / freertos_apps

Sample applications for FreeRTOS + micro-ROS
Apache License 2.0
81 stars 50 forks source link

Linked list of free heap memory blocks gets corrupted on reallocations #98

Closed jcmonteiro closed 2 years ago

jcmonteiro commented 2 years ago

Issue template

Steps to reproduce the issue

Reliably reproducing the issue might not be that easy as it depends on how the memory is allocated and reallocated and the fragmentation happening in the reallocation steps.

I can reliably trigger the issue in my setup by creating a lifecycle node. Depending on the rest of the code, the issue will happen at different parts, but always when reallocating memory while initializing the state machine.

Expected behavior

  1. The state machine is properly created.
  2. The linked list of free blocks, defined in custom_memory_manager:39-43, holds valid free blocks of coherent sizes that point to the next valid free block.

Actual behavior

  1. The state machine is not created correctly and there is "garbage" populating different variables.
  2. The linked list of free blocks is corrupted.

Additional information

I read through the implementation of the memory management functions, and it appears to me that there is a bug in the reallocation function - pvPortRealloc. In particular, the aforementioned line should hold the count to the amount of allocated memory for the object, and not the total free block size. Therefore, it should be changed to

size_t count = (pxLink->xBlockSize & ~xBlockAllocatedBit) - xHeapStructSize;

Suggested changes

I believe the state machine could be refactored to avoid all the reallocation, to begin with. We know the number of transitions and states beforehand, so we could already allocate all the necessary memory space for the maps. Naturally, this would be a PR to ros2/rcl

jcmonteiro commented 2 years ago

By changing the line that defines size_t count I have managed to fix the issue on my side. I will make a PR with the suggested changes. Please, advise if I have missed something here and my assumption about the count is off.

jcmonteiro commented 2 years ago

An extra bit of information here, I have defined the following additional function at custom_memory_allocation.c that helped me narrow down the problem. It is used to print the free blocks of memory by following the linked list.

static void prvDisplayHeapBlocks(void);
static void prvDisplayHeapBlocks(void)
{
    BlockLink_t *pxIterator;

    pxIterator = &xStart;
    pxIterator = pxIterator->pxNextFreeBlock;
    while ( pxIterator != NULL )
    {
        printf("[%6d] @ [%p]\n", pxIterator->xBlockSize, (void*)pxIterator);
        pxIterator = pxIterator->pxNextFreeBlock;
    }
    printf("--------\n");
}

While debugging, I have added this function in three different places inside pvPortRealloc to pinpoint which call was corrupting the memory.

void *pvPortRealloc( void *pv, size_t xWantedSize )
{
    vTaskSuspendAll();

    prvDisplayHeapBlocks();                            // <-------- printing before allocating a new block
    void * newmem = pvPortMalloc(xWantedSize);
    prvDisplayHeapBlocks();                            // <-------- printing after allocating a new block

    uint8_t *puc = ( uint8_t * ) pv;
    BlockLink_t *pxLink;

    puc -= xHeapStructSize;
    pxLink = ( void * ) puc;

    char *in_src = (char*)pv;
    char *in_dest = (char*)newmem;
    size_t count = pxLink->xBlockSize & ~xBlockAllocatedBit;

    while(count--)
        *in_dest++ = *in_src++;
    prvDisplayHeapBlocks();                            // <-------- printing after copying

    vPortFree(pv);

    ( void ) xTaskResumeAll();

    return newmem;
}

After many calls to reallocate memory, I eventually see something like this

calling pvPortRealloc() with xWantedSize = 96
[   16] @ [0x2000d7b0 == ucHeap + 47244]
[   24] @ [0x2000d7f0 == ucHeap + 47308]
[  104] @ [0x2000dac8 == ucHeap + 48036]
[ 5848] @ [0x2000df18 == ucHeap + 49140]
[    0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[   16] @ [0x2000d7b0 == ucHeap + 47244]
[   24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5848] @ [0x2000df18 == ucHeap + 49140]
[    0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[   16] @ [0x2000d7b0 == ucHeap + 47244]
[   24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5920] @ [0x2000ded0 == ucHeap + 49068]
[    0] @ [0x2000f5f0 == ucHeap + 54988]

calling pvPortRealloc() with xWantedSize = 96
--------
[   16] @ [0x2000d7b0 == ucHeap + 47244]
[   24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5920] @ [0x2000ded0 == ucHeap + 49068]
[    0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[   16] @ [0x2000d7b0 == ucHeap + 47244]
[   24] @ [0x2000d7f0 == ucHeap + 47308]
[ 5816] @ [0x2000df38 == ucHeap + 49172]
[    0] @ [0x2000f5f0 == ucHeap + 54988]
--------
[   16] @ [0x2000d7b0 == ucHeap + 47244]
[   24] @ [0x2000d7f0 == ucHeap + 47308]
[  104] @ [0x2000dac8 == ucHeap + 48036]
[-2147483528] @ [0x2000df38 == ucHeap + 49172]

In this example, I have reserved 55000 bytes of heap memory. You can see in the last printed line that the linked list is corrupted. The program continue executing, but the outcome is non-deterministic. By making that change that I have suggested, the data integrity is preserved.

Acuadros95 commented 2 years ago

Good catch! Let me handle the PRs, as we have this custom allocators on multiples repositories.

jcmonteiro commented 2 years ago

@Acuadros95, thank you for taking a look at this. Additionally, it is worth mentioning that, according to the documentation at ros2/rcutils/include/rcutils/allocator.h:59-66, the implementation of realloc should behave as realloc. Therefore, two things are missing in this implementation.

  1. Check for an unsuccessful allocation. In this case, the pointer should not be freed and a nullptr should be returned.
  2. According to the documentation on realloc - The content of the memory block is preserved up to the lesser of the new and old sizes. Therefore, count should be the minimum of (pxLink->xBlockSize & ~xBlockAllocatedBit) - xHeapStructSize and xWantedSize.
Acuadros95 commented 2 years ago

Here we go: https://github.com/micro-ROS/freertos_apps/pull/99 Please test it out and give feedback.

jcmonteiro commented 2 years ago

Of course, @Acuadros95. I will let you know.

@LiWeiAn, could you test the changes in #99?

Acuadros95 commented 2 years ago

@jcmonteiro the PR is merged! Closing, please reopen if you encounter any other issues.