mstange / samply

Command-line sampling profiler for macOS and Linux
Apache License 2.0
1.98k stars 48 forks source link

Allow attaching to existing macOS process #190

Closed vvuk closed 1 month ago

vvuk commented 2 months ago

Implement attaching to existing mac processes. This needs samply to be signed with the debugger entitlement:

  1. Create a file ent.xml:
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
    <key>com.apple.security.cs.debugger</key>
    <true/>
    </dict>
    </plist>
  2. cargo build
  3. codesign --force --options runtime --sign - --entitlements ent.xml target/debug/samply

The basic functionality works, but there is still work to do:

  1. This doesn't capture jitdump or markers. It's possible to just watch /tmp/ for appropriately-named jitdump files, but a real solution would need cooperation with the existing process to notify it to dump all existing jit compiled info to jitdump files.
  2. This won't capture child processes. That relies on the dyld preload to send the task of every new process back, and there's no preload here because there's no env vars set.

For 2, this might be possible to sort out if can set the dyld preload env var in the target process. I don't know if there's a simpler way to do this, but one idea is to load the preload shared library into the target (I think this is doable?) and then create a new thread in that process that calls an init function from the preload lib which will set the process env.

vvuk commented 2 months ago

Hrm. Being able to attach to newly created children seems very complicated, borderline not doable. In order to write to the environment, the magic seems to be the _NSGetEnviron symbol which returns a char*** (pointer to the location where the array of char* that form the environment is stored). There are many problems with this path:

Spawning a thread in the target and executing code seems like maybe more possible, especially if that code can be dlopen()/dlsym()/call. Reading through this stuff it pointed me to this threadexec library for which the source is pretty damn complex. But one interesting bit is that it takes the address of a function in the local process, and uses that same address in the target process... so maybe the dyld cache is not mapped at different locations in different processes? I can't believe that wouldn't be the case for security, maybe it wasn't 6 years ago.

Given all this, I'm inclined to just not support attaching to new children when profiling a target process. It should be possible to attach to existing children, though.

mstange commented 2 months ago

For profiling child processes, we could poll proc_listchildpids, if it doesn't have too much overhead. There's a Rust wrapper in remoteprocess::Process::child_processes. We'd miss the very beginning of new processes, but it's probably good enough for most use cases.

mstange commented 2 months ago

And for profiling system-wide, if it's even possible with acceptable overhead, we could list all processes using the KERN_PROC_ALL sysctl, like lldb does in Host::FindProcessesImpl.

mstange commented 2 months ago

Thanks for investigating this! I was imagining using a sudo subprocess for this, but self-signing samply is a good idea too. We could even have both. And we could have a samply setup command that does the self-signing for the user, with some kind of interactive wizard.

mstange commented 2 months ago

Spawning a thread in the target and executing code seems like maybe more possible, especially if that code can be dlopen()/dlsym()/call. Reading through this stuff it pointed me to this threadexec library for which the source is pretty damn complex.

I wanted to point you at Listing 12-9 from the *OS Internals Book 1, but then I found this post which has further improved on it, and one of the comments on the gist with the full code links to this implementation which is arm64 compatible.

vvuk commented 2 months ago

Thanks for investigating this! I was imagining using a sudo subprocess for this, but self-signing samply is a good idea too. We could even have both. And we could have a samply setup command that does the self-signing for the user, with some kind of interactive wizard.

Yep, I was thinking samply setup too. Followup PR though. For root -- I think self-signing via setup and/or just calling via sudo directly should be sufficient, I don't think we need a separate subprocess. I swear I read somewhere that at some point Apple will disallow task_for_pid even for root processes if they don't have the entitlement, but I can't find it. It works right now; same code, just running sudo samply without code signing.

I wanted to point you at Listing 12-9 from the *OS Internals Book 1, but then I found this post which has further improved on it, and one of the comments on the gist with the full code links to this implementation which is arm64 compatible.

Awesome, thanks! Good to know, though I probably won't go down this route any time soon.

mstange commented 2 months ago

and/or just calling via sudo directly should be sufficient

I want to discourage sudo samply record because this would also run the webserver as root. I think it would also leave a profile.json file that's not writable without sudo.