rizinorg / rizin

UNIX-like reverse engineering framework and command-line toolset.
https://rizin.re
GNU Lesser General Public License v3.0
2.71k stars 361 forks source link

Tcache parsing should be handled on per thread basis #1259

Open MalhotraPulak opened 3 years ago

MalhotraPulak commented 3 years ago

Is your feature request related to a problem? Please describe. In Glibc heap, a different tcache is created per thread. Rizin uses Arenas to find and parse the tcaches. This consequently leads to Rizin not displaying all the tcaches (dmht command) when the number of threads is greater than the number of arenas i.e. multiple threads share an Arena.

Here is an example binary:

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>

void *thread1(void *vargp)
{
    char* a = (char*) malloc(40);
    char* b = (char*) malloc(40);
    char* c = (char*) malloc(40);
    free(a);
    free(b);
    free(c);
    sleep(100);
    return NULL;
}

int main()
{
    pthread_t thread[100];
    for (int i = 0; i < 100; i++){
        pthread_create(&thread[i], NULL, thread1, NULL);
    }
    sleep(1);
    __builtin_trap();
    exit(0);
}

The binary above spawns 100 threads (101 if you include main thread) and populates the tcache in each thread. The output for Rizin:

[0x55c8e85e528c]> dmha~?Thread
47
[0x55c8e85e528c]> dmha~?Main
1
[0x55c8e85e528c]> dmha~?arena
48

Rizin reports total 48 arenas which is accurate. (6 cores * 8) Now output for dmht command:

[0x55c8e85e528c]> dmht
Tcache in Main Arena @  0x7f85823eab80
Tcache in Thread Arena @  0x7f84a4000020
Tcache_bin[01] Items: 3
 -> Chunk(addr=0x7f84a4000bb0, size=0x30, flags=NON_MAIN_ARENA,PREV_INUSE)
 -> Chunk(addr=0x7f84a4000b80, size=0x30, flags=NON_MAIN_ARENA,PREV_INUSE)
 -> Chunk(addr=0x7f84a4000b50, size=0x30, flags=NON_MAIN_ARENA,PREV_INUSE)
Tcache in Thread Arena @  0x7f84ac000020
Tcache_bin[01] Items: 3
 -> Chunk(addr=0x7f84ac000bb0, size=0x30, flags=NON_MAIN_ARENA,PREV_INUSE)
 -> Chunk(addr=0x7f84ac000b80, size=0x30, flags=NON_MAIN_ARENA,PREV_INUSE)
 -> Chunk(addr=0x7f84ac000b50, size=0x30, flags=NON_MAIN_ARENA,PREV_INUSE)
.... goes on for a while ...
[0x55c8e85e528c]> dmht~?Chunk
141
[0x55c8e85e528c]> dmht~?Tcache_bin
47
[0x55c8e85e528c]> dmht~?Tcache in Thread Arena
47

Rizin finds 141 chunks across 47 Tcache bins. (3 chunks per bins and 1 bin per thread). This is incorrect as there were total 100 threads created and each thread would have its own tcache. We can verify this using dev build of GEF which recently fixed an issue like this.

gef➤  heap bins tcache all
─────────────────────────────────────── Tcachebins for thread 1 ───────────────────────────────────────
All tcachebins are empty
─────────────────────────────────────── Tcachebins for thread 2 ───────────────────────────────────────
Tcachebins[idx=1, size=0x30] count=3  ←  Chunk(addr=0x7ffff0000bc0, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7ffff0000b90, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7ffff0000b60, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA) 
─────────────────────────────────────── Tcachebins for thread 3 ───────────────────────────────────────
Tcachebins[idx=1, size=0x30] count=3  ←  Chunk(addr=0x7fffe8000bc0, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7fffe8000b90, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7fffe8000b60, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA) 

..........goes on for a while......
────────────────────────────────────── Tcachebins for thread 100 ──────────────────────────────────────
Tcachebins[idx=1, size=0x30] count=3  ←  Chunk(addr=0x7fff2c001200, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7fff2c0011d0, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7fff2c0011a0, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA) 
────────────────────────────────────── Tcachebins for thread 101 ──────────────────────────────────────
Tcachebins[idx=1, size=0x30] count=3  ←  Chunk(addr=0x7fff28001200, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7fff280011d0, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA)  ←  Chunk(addr=0x7fff280011a0, size=0x30, flags=PREV_INUSE|NON_MAIN_ARENA) 

Describe the solution you'd like A GDB like output is expected where 100 populated tcache bins are found. Printing the thread ID instead of arena address also seems better to convey to the user that tcache belongs to threads not arenas. As I am working on Cutter Heap Viewer at the moment, I would give this issue a try right now and resolve this before I refactor the tcache part and implement tcache in Cutter heap viewer.

MalhotraPulak commented 3 years ago

Some notes for this issue

Tcache implementation in malloc.c

typedef struct tcache_perthread_struct
{
char counts[TCACHE_MAX_BINS];
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
 
static __thread bool tcache_shutting_down = false;
static __thread tcache_perthread_struct *tcache = NULL;

The core issue here is to get the base address of tcache struct of typetcache_perthread_struct for each thread. Once you have that you can easily print the bins using that struct.

How does gef solve this issue?

    def find_tcache():
        """Return the location of the current thread's tcache."""
        try:
            # For multithreaded binaries, the tcache symbol (in thread local
            # storage) will give us the correct address.
            tcache_addr = gdb.parse_and_eval("(void *) tcache")
        except gdb.error:
            # In binaries not linked with pthread (and therefore there is only
            # one thread), we can't use the tcache symbol, but we can guess the
            # correct address because the tcache is consistently the first
            # allocation in the main arena.
            heap_base = HeapBaseFunction.heap_base()
            if heap_base is None:
                err("No heap section")
                return 0x0
            tcache_addr = heap_base + 0x10
        return tcache_addr

gef calls find_tcache from each thread to get the base address of tcache pointer. This function uses the (void *) tcache symbol which gives us the address. A few points to note about the tcache symbol:

We need three main things to solve this issue:

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. Considering a lot has probably changed since its creation, we kindly ask you to check again if the issue you reported is still relevant in the current version of rizin. If it is, update this issue with a comment, otherwise it will be automatically closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. Considering a lot has probably changed since its creation, we kindly ask you to check again if the issue you reported is still relevant in the current version of rizin. If it is, update this issue with a comment, otherwise it will be automatically closed if no further activity occurs. Thank you for your contributions.