stclib / STC

A modern, user friendly, generic, type-safe and fast C99 container library: String, Vector, Sorted and Unordered Map and Set, Deque, Forward List, Smart Pointers, Bitset and Random numbers.
MIT License
1.34k stars 73 forks source link

Arc can leak memory if data happens to be allocated directly adjacent to use_count #81

Closed wmww closed 1 month ago

wmww commented 9 months ago

There are two ways to create an arc. The data and the use_count can be allocated separately (as done by arc_X_from_ptr()) or they can be done together (as with arc_X_make()). The problem lies in arc_X_drop():

STC_INLINE void _c_MEMB(_drop)(const i_type* cself) {
    i_type* self = (i_type*)cself;
    if (self->use_count && _i_atomic_dec_and_test(self->use_count)) {
        i_keydrop(self->get);
        if ((char *)self->get != (char *)self->use_count + offsetof(struct _c_MEMB(_rep_), value)) {
            i_free(self->get, c_sizeof *self->get);
            i_free(self->use_count, c_sizeof(long));
        } else { // allocated combined counter+value with _make()
            i_free(self->use_count, c_sizeof(struct _c_MEMB(_rep_)));
        }
    }
}

It dynamically determines how to free the structure based on if ->get points to the position relative to ->use_count that you would expect if they were allocated in a single block. Unfortunately there is no guarantee that this layout isn't the case if the memory blocks were allocated separately. If this happens the memory of the data is leaked.

In practice this is extremely unlikely to cause a serious problem in the wild. I spotted it reading the code, and was only able to reproduce after much experimentation. I have been unable to make glibc malloc on x86-64 Linux reproduce at all, however jemalloc seems more willing to put blocks of memory next to each other.

With jemalloc installed and the following code:

#include <stdio.h>

typedef struct Data {
    char a[1];
} Data;

#define i_key Data
#include "stc/arc.h"

int main() {
    const int ptr_len = 50;
    void* ptrs[ptr_len];
    while (true) {
        int blocks = rand() % ptr_len;
        for (int j = 0; j < blocks; j++) {
            ptrs[j] = malloc(1);
        }
        Data* data = malloc(sizeof(Data));
        arc_Data arc = arc_Data_from_ptr(data);
        for (int j = 0; j < blocks; j++) {
            free(ptrs[j]);
        }
        arc_Data_drop(&arc);
    }
    return 0;
}

Compiled and run with:

gcc test.c && LD_PRELOAD=`jemalloc-config --libdir`/libjemalloc.so.`jemalloc-config --revision` ./a.out

Nothing happens, because very small memory leaks are hard to detect and it doesn't reproduce in Valgrind, however if you add a print statement to the 2nd branch of the condition in arc_X_drop you can see it is taking that branch, which it should never do.

Since it could show up in unexpected places and be very hard to debug, I think this is worth fixing even if the problem is mostly theoretical at this point.

wmww commented 9 months ago

I managed to reproduce with glibc, and got it to leak a noticeable amount of memory:

#include <stdio.h>

#define LEAK

typedef struct Data {
    char a[1024];
} __attribute__((aligned (32))) Data;

#define i_key Data
#include "stc/arc.h"

#define arcs_len 100000
int main() {
    arc_Data arcs[arcs_len] = {0};
    unsigned i = 0;
    while (true) {
        for (int k = 0; k < 100000; k++) {
            void* random_ptr = malloc(1);
            for (int j = 0; j < 50; j++) {
                i = (i + rand()) % arcs_len;
                if (!arcs[i].get) {
#ifdef LEAK
                    Data* data = malloc(sizeof(Data));
                    arcs[i] = arc_Data_from_ptr(data);
#else
                    arcs[i] = arc_Data_make((Data){0});
#endif
                }
            }
            for (int j = 0; j < 100; j++) {
                i = (i + rand()) % arcs_len;
                if (arcs[i].get) {
                    arc_Data_drop(&arcs[i]);
                    arcs[i] = (arc_Data){0};
                }
            }
            free(random_ptr);
        }
        for (int j = 0; j < arcs_len; j++) {
            if (arcs[j].get) {
                arc_Data_drop(&arcs[j]);
                arcs[j] = (arc_Data){0};
            }
        }
        printf("Reset\n");
    }
    return 0;
}

This can be compiled and run simply with

gcc test.c && ./a.out

It's a bit convoluted in order to keep the heap in a state where the bug consistently happens for an extended period of time. Note that the used memory usage of the process slowly creeps up (by about a megabyte every 25 seconds on my machine) when LEAK is defined, and jumps around a little but doesn't consistently rise when it's not.

__attribute__((aligned (32))) is used so that the data arc's value struct is 32-byte aligned. This way it's possible for the dynamic allocation (which seems to always be 32-byte aligned) to line up with what arc is expecting in the single allocation case.