parca-dev / parca-agent

eBPF based always-on profiler auto-discovering targets in Kubernetes and systemd, zero code changes or restarts needed!
https://parca.dev/
Apache License 2.0
507 stars 67 forks source link

Lua Support #1889

Open kakkoyun opened 11 months ago

kakkoyun commented 11 months ago
Namanl2001 commented 7 months ago

hey @kakkoyun what's the

  1. priority
  2. complexity, of this issue?

    thanks

kakkoyun commented 7 months ago

hey @kakkoyun what's the

  1. priority

This is actually quite close to the top of our priority list. However, the team needs a couple of months to get to it.

  1. complexity, of this issue?

This could be quite complex. You can check #1933 and #1984 PRs for the scope of it. But it shouldn't intimidate you.

If you want to give it a shot, go for it!

brancz commented 7 months ago

I actually think it's a little more tricky than ruby/python as primarily we want to support LuaJIT, so we need something that's a mix of the native unwinder and the python/ruby unwinder, that we can switch to. @javierhonduco is already looking at how we could make switching work, but I think it's harder than it looks at first glance.

On a positive note though, there is some prior art that suggests that something similar to the python/ruby purely for reading the frames should indeed work: https://github.com/yunwei37/nginx-lua-ebpf-toolkit

kakkoyun commented 7 months ago

Let's wait for @javierhonduco's investigation result. Thus, I'm assigning it to him for now. @Namanl2001, we will keep you in the loop; we would love this to be handled by the community ❤️

sichvoge commented 4 months ago

@kakkoyun Has there been any progress on this ticket?

kakkoyun commented 4 months ago

Hey @sichvoge, it's on our immediate roadmap now. That being said, adding the Lua support will still take 1-2 months.

cc @Sylfrena

sichvoge commented 4 months ago

Awesome thanks Kemal!

gnurizen commented 3 months ago

Notes on how to approach this issue

Lua has a couple existing profile solutions:

  1. LuaJIT built in signal/timer based profiler
  2. eBPF profiler written in C

The ideal goal is to have a Lua/Native mixed frames profiler. The apisix profiler claims to do this but its not clear how since it relies on bpf_get_stackid which only works with frame pointer enabled stacks. Some comments at the end of the developer guide lead me to believe it has issues. I suspect it reliably gets the lua stack but not the native stack beneath it. Another draw back is that it relies on uprobe's attached to lua_pcall/lua_resume to get the "current" lua state.

The LuaJIT profiler avoids this problem by stashing the Lua global pointer.

Questions:

Can we get the lua state w/o a uprobe?

W/o the ability to walk the stack this is dubious, if we could walk stack frames and identify the C/JIT call boundary its probably relatively straight forward to pluck the Lua state pointer as the first arg. Since our unwind tables tell us which is a C frame and which is a JIT frame this seems doable. The Lua global isn't stored anywhere like a thread local so we can't do what works for Python et al. Its possible that we could do something container specific, ie peak into openresty's module code and try to find the Lua context pointers but then we'd have to manage that for everything that uses luajit.

Are uprobes a problem?

This needs to be analyzed but my guess is the uprobe overhead is much smaller than the average Lua program execution time in most contexts.

Can Lua Dwarf tables save us?

It was suggested on the Lua mailing list that we can walk Lua frames using the dwarf information Lua emits to handle stack unwinding machinery needed for C++ exceptions. This doesn't work however. Lua stack unwinding only works when the Lua frame is at a boundary/exit point (ie another function call), they don't work for unwinding at any arbitrary instruction. This was the best description of this issue I could find: https://news.ycombinator.com/item?id=37926172. This explains why perf and gdb can't unwind Lua stacks if the starting context is some random instruction in a Lua JIT'd frame. Even if you use the latest libunwind to unwind the lua stack in process it doesn't work and in fact will crash (actually newer versions of libunwind don't crash but still can't walk through a LuaJIT frame).

So our choices seem to be:

  1. Use the apisix approach and have unreliable native frames
  2. Patch luajit to support frame pointers
  3. Patch luajit to generate asynchronous unwind tables

I think the best course of action is start with #1 and in parallel request/advocate/contribute #2. I don't know what the state of LuaJIT development is but this issue is encouraging: https://github.com/LuaJIT/LuaJIT/issues/1092. Especially the bit about debugging.