yunwei37 / nginx-lua-ebpf-toolkit

profile and tracking tools for lua and nginx using eBPF
https://github.com/apache/apisix-profiler
58 stars 12 forks source link
ebpf lua nginx openresty uprobes

ebpf-based lua profile tools

Use ebpf to generate lua flamegraphs:

Note:

  • for the ebpf verifier instructions limit in kernel, the stack-trace deepth is limited to top 15 in lua. If you need to trace deeper, you need to use systemtap instead.
  • this project is not finished yet, and some errors may occurred.

The docker image can be found in:

docker pull ghcr.io/yunwei37/nginx-lua-profile:latest

probe lua stack in nginx

see: bpftools/profile_nginx_lua/profile.bpf.c

first, we use uprobe in ebpf to attach to libluajit.so get the lua_State:

static int probe_entry_lua(struct pt_regs *ctx)
{
    if (!PT_REGS_PARM1(ctx))
        return 0;

    __u64 pid_tgid = bpf_get_current_pid_tgid();
    __u32 pid = pid_tgid >> 32;
    __u32 tid = (__u32)pid_tgid;
    struct lua_stack_event event = {};

    if (targ_pid != -1 && targ_pid != pid)
        return 0;
    // event.time = bpf_ktime_get_ns();
    event.pid = pid;
    // bpf_get_current_comm(&event.comm, sizeof(event.comm));
    event.L = (void *)PT_REGS_PARM1(ctx);
    // bpf_printk("lua_state %p\n", event.L);
    bpf_map_update_elem(&lua_events, &tid, &event, BPF_ANY);
    return 0;
}

to get stack frame of lua, it uses a loop to backtrace the lua vm stack and find all information of functions:

see the fix_lua_stack function:

    ....
    cTValue *frame, *nextframe, *bot = tvref(BPF_PROBE_READ_USER(L, stack)) + LJ_FR2;
    int i = 0;
    frame = nextframe = BPF_PROBE_READ_USER(L, base) - 1;
    /* Traverse frames backwards. */
    // for the ebpf verifier insns (limit 1000000), we need to limit the max loop times to 15
    for (; i < 15 && frame > bot; i++)
    {
        if (frame_gc(frame) == obj2gco(L))
        {
            level++; /* Skip dummy frames. See lj_err_optype_call(). */
        }
        if (level-- == 0)
        {
            level++;
            // *size = (nextframe - frame);
            /* Level found. */
            if (lua_get_funcdata(ctx, frame, eventp, count) != 0)
            {
                continue;
            }
            count++;
        }
        nextframe = frame;
        if (frame_islua(frame))
        {
            frame = frame_prevl(frame);
        }
        else
        {
            if (frame_isvarg(frame))
                level++; /* Skip vararg pseudo-frame. */
            frame = frame_prevd(frame);
        }
    }
    ....

after that, it gets the function data send the function data to user space:

static inline int lua_get_funcdata(struct bpf_perf_event_data *ctx, cTValue *frame, struct lua_stack_event *eventp, int level)
{
    if (!frame)
        return -1;
    GCfunc *fn = frame_func(frame);
    if (!fn)
        return -1;
    if (isluafunc(fn))
    {
        eventp->type = FUNC_TYPE_LUA;
        GCproto *pt = funcproto(fn);
        if (!pt)
            return -1;
        eventp->ffid = BPF_PROBE_READ_USER(pt, firstline);
        GCstr *name = proto_chunkname(pt); /* GCstr *name */
        const char *src = strdata(name);
        if (!src)
            return -1;
        bpf_probe_read_user_str(eventp->name, sizeof(eventp->name), src);
        bpf_printk("level= %d, fn_name=%s\n", level, eventp->name);
    }
    else if (iscfunc(fn))
    {
        eventp->type = FUNC_TYPE_C;
        eventp->funcp = BPF_PROBE_READ_USER(fn, c.f);
    }
    else if (isffunc(fn))
    {
        eventp->type = FUNC_TYPE_F;
        eventp->ffid = BPF_PROBE_READ_USER(fn, c.ffid);
    }
    eventp->level = level;
    bpf_perf_event_output(ctx, &lua_event_output, BPF_F_CURRENT_CPU, eventp, sizeof(*eventp));
    return 0;
}

in user space, it will use the user_stack_id to mix the lua stack with the original user and kernel stack:

see bpftools/profile_nginx_lua/profile.c: print_fold_user_stack_with_lua

                ....
                const struct lua_stack_event* eventp = &(lua_bt->stack[count]);
                if (eventp->type == FUNC_TYPE_LUA)
                {
                    if (eventp->ffid) {
                        printf(";L:%s:%d", eventp->name, eventp->ffid);
                    } else {
                        printf(";L:%s", eventp->name);
                    }
                }
                else if (eventp->type == FUNC_TYPE_C)
                {
                    sym = syms__map_addr(syms, (unsigned long)eventp->funcp);
                    if (sym)
                    {
                        printf(";C:%s", sym ? sym->name : "[unknown]");
                    }
                }
                else if (eventp->type == FUNC_TYPE_F)
                {
                    printf(";builtin#%d", eventp->ffid);
                }
                ....

If the lua stack output user_stack_id matches the original user_stack_id, this means the stack is a lua stack. Then, we replace the [unknown] function whose uip insides the luajit vm function range with our lua stack. This may not be totally correct, but it works for now. After printing the stack, we can use

results flamegraph

lua:

flamegraph

lua and c:

flamegraph

reference

for reference, I looked into the debug functions of lua vm:

LJ_FUNC void lj_debug_dumpstack(lua_State *L, SBuf *sb, const char *fmt,
                int depth);

in lj_debug.h and lj_debug.c from luajit source code: openresty-1.21.4.1/build/LuaJIT-2.1-20220411/src/lj_debug.h

for lua data structure definition, see: bpftools/profile_nginx_lua/lua_state.h, it's copied from luajit headers.

we can determine the luajit gc32/64 from:

bpftools/profile_nginx_lua/lua_state.h:9

#define LJ_TARGET_GC64 1

the openresty used and tested is from https://openresty.org/en/benchmark.html, and the APISIX used is https://github.com/apache/apisix

and:

The ebpf program is from: https://github.com/iovisor/bcc/pull/3782

to run lua profile:

tested with luajit-5.1.so gc64:

for example, use apisix profile scripts in CI to start APISIX(ci/performance_test.sh):

# get nginx pid
pgrep -P $(cat logs/nginx.pid) -n -f worker
# sample only user stack and lua stack, use fold output, trace pid 36685 for nginx
cd bpftools/profile_nginx_lua
make
sudo ./profile -f -F 499 -U -p [pid] --lua-user-stacks-only > a.bt
# get flame graph
cat a.bt | ../../tools/FlameGraph/flamegraph.pl > a.svg

test when running benchmark

basic benchmark:

cd ~/work
PATH=/usr/local/openresty/nginx/sbin:$PATH
export PATH
nginx -p `pwd`/ -c conf/nginx.conf
curl http://localhost:8080/

or using APISIX performance test:

git clone https://github.com/apache/apisix
cd apisix
sudo apt-get install -y libpcre3 libpcre3-dev
sudo apt-get install -y openssl libssl-dev unzip zlib*
sudo ./ci/performance_test.sh install_dependencies
sudo ./ci/performance_test.sh install_wrk2
sudo ./ci/performance_test.sh install_stap_tools
./ci/performance_test.sh run_performance_test

test with containers or missing debug info

This tool can get the APISIX process within a container, and get some stack trace ip. However, If the debug info is not shipped with the docker, the tool cannot use the uprobe to trace the lua stack, and the c stack also cannot get any valid output. Solutions may be:

for APISIX OSPP

项目产出要求:

使用eBPF捕获和解析 Apache APISIX 中的 lua 调用堆栈信息,对其进行汇总并生成cpu火焰图:

TODO: