jbush001 / NyuziToolchain

Port of LLVM/Clang C compiler to Nyuzi parallel processor architecture
Other
62 stars 28 forks source link

calloc() can return memory that is not zeroed #104

Open jbush001 opened 6 years ago

jbush001 commented 6 years ago

The following program:

#define CALLOC_SIZE 1024

    void *ptr;
    int i;

    ptr = calloc(CALLOC_SIZE, 4);
    for (i = 0; i < CALLOC_SIZE; i++)
    {
        if (((unsigned int*) ptr)[i] != 0)
        {
            printf("FAIL: calloc failed to clear memory\n");
            return 0;
        }
    }

Will fail. One byte is in the buffer is not zero. It appears the following sequence of events is occurring:

  1. Program calls malloc initially with a small size during libc initialization (allocating a data structure to register an atexit function). dlmalloc calls into sbrk to allocate 4k of memory. Because it expects sbrk to zero the memory, it sets a flag to indicate it is already zeroed. It puts a tag at the end of this to indicate the size.
  2. The calloc above is called. This needs to call sbrk again, because the remaining free space is not large enough. The end tag is still present:
  3. In calloc, it skips calling memset:
void* dlcalloc(size_t n_elements, size_t elem_size) {
    void* mem;
    size_t req = 0;
...
    mem = dlmalloc(req);
    if (mem != 0 && calloc_must_clear(mem2chunk(mem)))
        memset(mem, 0, req);
    return mem;
}
jbush001 commented 6 years ago

Setting PINUSE (previous block in-use) should have caused calloc_must_clear to be true, so memset should have been called.

#define PINUSE_BIT          1
#define CINUSE_BIT          2
#define INUSE_BITS          (PINUSE_BIT|CINUSE_BIT)
#define is_mmapped(p)       (((p)->head & INUSE_BITS) == 0)
#define calloc_must_clear(p) (!is_mmapped(p))

The interesting thing is that it doesn't even seem to generate a call to memset:

calloc:
...
    4bbc:   31 f5 ff f9     call -11068 <malloc>
    4bc0:   fe d3 00 a8     load_32 ra, 52(sp)
    4bc4:   3e e3 00 a8     load_32 s25, 56(sp)
    4bc8:   1e f3 00 a8     load_32 s24, 60(sp)
    4bcc:   de 03 01 05     add_i sp, sp, 64
    4bd0:   1f 00 00 f0     ret

I added a printf to output the value of calloc_must_clear in calloc:

    mem = dlmalloc(req);
    printf("calloc_must_clear = %d\n", calloc_must_clear(mem2chunk(mem)));
    if (mem != 0 && calloc_must_clear(mem2chunk(mem)))
        memset(mem, 0, req);

After this, it generated a conditional call to memset after the printf (and the program works correctly):

    4bc0:   30 f5 ff f9     call -11072 <malloc>
 ...
    4be0:   19 f0 ff a1     load_u8 s0, -4(s25)            # read flags of block
    4be4:   00 0c 00 01     and s0, s0, 3                       # and with INUSE_BITS
    4be8:   a0 00 00 f2     bz s0, 20 <calloc+0x90>  # If they are clear, skip 
    4bec:   00 80 fc c0     move s0, s25
    4bf0:   20 00 00 0f     move s1, 0
    4bf4:   40 00 fc c0     move s2, s24
    4bf8:   a9 07 00 f8     call 7844 <memset>

Which suggests a code generation issue. It could be a compiler bug, undefined behavior, or a bad set of configuration macros that is optimizing out the call.

jbush001 commented 6 years ago

Here is the expanded version of the check to call memset (clang -E):

    if (mem != 0 && (!(((((mchunkptr)((char*)(mem) - ((sizeof(size_t))<<1))))->head & ((((size_t)1))|(((size_t)2)))) == 0)))
        memset(mem, 0, req);