Allow attaching to existing macOS process

vvuk commented 2 months ago

Implement attaching to existing mac processes. This needs samply to be signed with the debugger entitlement:

Create a file ent.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.cs.debugger</key>
<true/>
</dict>
</plist>

cargo build
codesign --force --options runtime --sign - --entitlements ent.xml target/debug/samply

The basic functionality works, but there is still work to do:

This doesn't capture jitdump or markers. It's possible to just watch /tmp/ for appropriately-named jitdump files, but a real solution would need cooperation with the existing process to notify it to dump all existing jit compiled info to jitdump files.
This won't capture child processes. That relies on the dyld preload to send the task of every new process back, and there's no preload here because there's no env vars set.

For 2, this might be possible to sort out if can set the dyld preload env var in the target process. I don't know if there's a simpler way to do this, but one idea is to load the preload shared library into the target (I think this is doable?) and then create a new thread in that process that calls an init function from the preload lib which will set the process env.

vvuk commented 2 months ago

Hrm. Being able to attach to newly created children seems very complicated, borderline not doable. In order to write to the environment, the magic seems to be the _NSGetEnviron symbol which returns a char*** (pointer to the location where the array of char* that form the environment is stored). There are many problems with this path:

We need to figure out the address of _NSGetEnviron from libSystem. This is maybe possible, since we have the library info for the target task, and we can load debug info... we should be able to just forward-resolve the symbol ourselves.
Once we got the address of this symbol, in theory we should be able to read task memory of the relocated instructions and.. maybe.. compute the actual address of the environment pointer. I'm not sure if we'll have enough info.
Assuming we get the environment pointer, we can allocate and write task memory to create a new env block
... but that will blow up the next time the process tries to setenv or similar because setenv and friends use realloc, so the memory needs to come from malloc

Spawning a thread in the target and executing code seems like maybe more possible, especially if that code can be dlopen()/dlsym()/call. Reading through this stuff it pointed me to this threadexec library for which the source is pretty damn complex. But one interesting bit is that it takes the address of a function in the local process, and uses that same address in the target process... so maybe the dyld cache is not mapped at different locations in different processes? I can't believe that wouldn't be the case for security, maybe it wasn't 6 years ago.

Given all this, I'm inclined to just not support attaching to new children when profiling a target process. It should be possible to attach to existing children, though.

mstange commented 2 months ago

For profiling child processes, we could poll proc_listchildpids, if it doesn't have too much overhead. There's a Rust wrapper in remoteprocess::Process::child_processes. We'd miss the very beginning of new processes, but it's probably good enough for most use cases.

mstange commented 2 months ago

And for profiling system-wide, if it's even possible with acceptable overhead, we could list all processes using the KERN_PROC_ALL sysctl, like lldb does in Host::FindProcessesImpl.

mstange commented 2 months ago

Thanks for investigating this! I was imagining using a sudo subprocess for this, but self-signing samply is a good idea too. We could even have both. And we could have a samply setup command that does the self-signing for the user, with some kind of interactive wizard.

mstange commented 2 months ago

Spawning a thread in the target and executing code seems like maybe more possible, especially if that code can be dlopen()/dlsym()/call. Reading through this stuff it pointed me to this threadexec library for which the source is pretty damn complex.

I wanted to point you at Listing 12-9 from the *OS Internals Book 1, but then I found this post which has further improved on it, and one of the comments on the gist with the full code links to this implementation which is arm64 compatible.

vvuk commented 2 months ago

Thanks for investigating this! I was imagining using a sudo subprocess for this, but self-signing samply is a good idea too. We could even have both. And we could have a samply setup command that does the self-signing for the user, with some kind of interactive wizard.

Yep, I was thinking samply setup too. Followup PR though. For root -- I think self-signing via setup and/or just calling via sudo directly should be sufficient, I don't think we need a separate subprocess. I swear I read somewhere that at some point Apple will disallow task_for_pid even for root processes if they don't have the entitlement, but I can't find it. It works right now; same code, just running sudo samply without code signing.

I wanted to point you at Listing 12-9 from the *OS Internals Book 1, but then I found this post which has further improved on it, and one of the comments on the gist with the full code links to this implementation which is arm64 compatible.

Awesome, thanks! Good to know, though I probably won't go down this route any time soon.

mstange commented 2 months ago

and/or just calling via sudo directly should be sufficient

I want to discourage sudo samply record because this would also run the webserver as root. I think it would also leave a profile.json file that's not writable without sudo.

mstange / samply

Allow attaching to existing macOS process #190