Open trungnt2910 opened 1 year ago
Doing this might be problematic on macOS because of entitlements needed to make the required mach system calls. I actually tried to implement it on Noah on macOS Catalina.
Noah had every process running its own virtual processor with its own virtual address space, using similar page translation methods as blink. So basically each process manages its own Linux process on a micro VM. The Linux syscall interrupts are captured and routed to macOS. The VM state is also copied when a process is forked.
My best approach was to manage every process's debugging state using a central debugging helper. Mach's vm_read/vm_write and mig would be used to tell the tracee to go into an infinite loop, to wait for messages and continue. MIG would be used to read the process's virtual memory.
That's just the Mac side of things. I imagine a cross platform solution would be extremely complicated.
entitlements needed to make the required mach system calls.
All the operations in the documentation attached in my original post involves only sending a runtime signal, so the only call required is kill
.
The debuggee then sends the required information to the debugger through some kind of connection.
I think DarlingHQ uses a kernel module to achieve some of its ptrace functionality. IIRC, Darling also emulates Darwin's bsd threads in the lkm. I don't know if @jart plans to touch any kernel APIs.
I think DarlingHQ uses a kernel module to achieve some of its ptrace functionality.
Wrong, at least since early 2022.
See: https://github.com/darlinghq/darling/issues/1093
Darling also emulates Darwin's bsd threads in the lkm.
Half-true, DarlingHQ emulates BSD threads in its darlingserver
, a kernel emulator server running wholly on the userspace.
touch any kernel APIs.
Their ptrace
emulation does not involve any kernel API. It is a technique that only involves signaling, inter-process communication, and co-operation of the debuggee (so using this technique a debugger running on blink
cannot ptrace
any non-blink
processes on the host).
Wrong, at least since early 2022.
Half-true, DarlingHQ emulates BSD threads in its
darlingserver
, a kernel emulator server running wholly on the userspace.
Well that's pretty impressive stuff, I never knew they moved threads to userspace.
I understand that many darling
components were moved to userspace, but my guess is that Blink wouldn't want to rely on kernel APIs completely. If that's the case then 'Half-true' wouldn't be good enough. I'm not a core developer, so that guess is just an educated guess.
All the operations in the documentation attached in my original post involves only sending a runtime signal, so the only call required is
kill
.
According to DarlingHQ docs:
kill(SIGSTOP):
Send a RT signal to the debuggee that it should act as if SIGSTOP were sent to the process. We cannot send a real SIGSTOP, because then the debuggee couldn't provide/update register state to the debugger etc.
Firstly an RT signal is a realtime signal if I'm not wrong. You can't really send IPC/mach messages to a halted process.
Regardless, Blink could use CFMessagePortRef
, NSPort
, or MIG
(the Mach interface generator) for macOS IPC (I haven't gone through blink code). The first two are the best options and won't require entitlements. I'm not entirely sure if MIG
needs entitlements—however, any ptrace
syscall invocation will need entitlements.
According to DarlingHQ docs:
kill(SIGSTOP):
Send a RT signal to the debuggee that it should act as if SIGSTOP were sent to the process. We cannot send a real SIGSTOP, because then the debuggee couldn't provide/update register state to the debugger etc.
(Emphasis mine)
You can't really send IPC/mach messages to a halted process.
The process is not actually halted (from the host machine's perspective), but the emulated code (in DarlingHQ's case, macOS code, in Blink's case, the Linux binary) should stop executing. Therefore, while the emulated binary seems to have stopped, background emulator tasks should still continue working.
however, any ptrace syscall invocation will need entitlements.
Like I said, we're not actually invoking ptrace
of any kind on the host while using this technique. So blink
should not need any entitlement.
Like I said, we're not actually invoking
ptrace
of any kind on the host while using this technique. Soblink
should not need any entitlement.
Let me rephrase. On macOS you can halt a process without ptrace
using task_suspend
. However, any application that modifies another process's state at runtime is considered a debugger by Apple. This includes task_*
syscalls, vm_*
syscalls and (of course) ptrace
, hence entitlements being a concern. I don't make the rules, Apple does.
Secondly:
The process is not actually halted (from the host machine's perspective), but the emulated code (in DarlingHQ's case, macOS code, in Blink's case, the Linux binary) should stop executing. Therefore, while the emulated binary seems to have stopped, background emulator tasks should still continue working.
Wine and DarlingHQ (probably blink too) are not emulators, they're compatibility layers. I like to refer to them as userspace hypervisors, even though that's probably wrong. They're like execution coordinators, kinda like ld-linux
on linux or dyld
on macOS.
Listen dude, I'm not here to argue. I'm here to list the points and possibilities in this issue (help solve the issue) and learn new things. This back and forth is counterproductive.
Wine and DarlingHQ [...] are not emulators, they're compatibility layers.
I am aware of WINE's name as well as the fact that DarlingHQ loads a macOS binary directly on a Linux process's address space and executes instructions directly on the host's CPU. However, to simplify things I (and the DarlingHQ project itself) use the word "emulation" and its other forms. The term is commonly understood as "instruction set emulation" but depending on the context it can also mean "system call emulation" or "\<insert something that needs compatibility> emulation".
probably blink too
FYI, blink
is an emulator. Unlike "compatibility layers", which runs code directly on the host's CPU, blink
interprets binaries that uses a few supported x86_64 instruction sets.
any application that modifies another process's state at runtime is considered a debugger by Apple.
Again, quoting the documentation:
Debugging support in Darling makes use of what we call "cooperative debugging". It means the code in the debuggee is aware it's being debugged and actively assists the process.
So, imagining a scenario when blink
has ptrace
implemented using this "cooperative debugging" technique. This hypothetical future version of blink
is emulating a Linux debugger (for example, lldb
), debugging another blink
-emulated process, on a macOS host. This, in order, is what happens:
blink
-emulated debugger issues a syscall to ptrace
(Linux syscall 101), with the request PTRACE_ATTACH
.blink
detects this syscall through its JIT/interpreter (just like any other Linux syscalls), and transfer controls to the ptrace syscall emulation function.blink
's syscall emulation function (hypothetically, SysPtrace
) calls the POSIX function kill
in macOS's libSystem
to send a blink
-reserved signal (according to the current README, it's SIGSYS
) to the other blink
process that is emulating the desired debuggee. SysPtrace
also opens a UNIX socket (or any better IPC channel) for communication.blink
process receives this SIGSYS
and stops the process emulation. It parses the additional data sent along with the signal, somehow realizes that there is a request for it to be debugged and connects to the IPC channel.blink
process notices that the debuggee has connected and returns from SysPtrace
.ptrace
requests.SysPtrace
sends these ptrace
requests through the IPC channel.blink
process handles all these ptrace
messages by reading and/or modifying its own state. The modification should be restricted to blink
-emulated memory, registers, and other program states managed by blink
, and may not touch any macOS system internals.Therefore, the hypothetical future blink
binary should only need to be able to use kill
and UNIX sockets. The debuggee's process state is cooperatively modified/reported by the second blink
process.
This "cooperative debugging" carries one limitation that I mentioned before:
(so using this technique a debugger running on blink cannot ptrace any non-blink processes on the host).
Listen dude, I'm not here to argue. I'm here to list the points and possibilities in this issue
I'm not here to spark a debate either. I'm just clarifying my points for you, jart, other people who might help implement this in the future, and anyone who visits this discussion.
I understand how cooperative debugging works. I understand what could be done to emulate the syscall ptrace.
Like I said I've tried it before; more specifically tried and failed. What I don't understand is how we ended up here. I mentioned a possibility of something and you strike it down as impossible. I state the reason why I say that, and you maintain your stance. And you bring in examples from darling
on Linux when I'm talking about macOS.
Look, none of us knows everything here. However, I find your responses (or rebuttals) rather arrogant. I have a hard time believing you don't intend to spark a debate.
Regardless I'll maintain that there MAY BE A POSSIBLITY that entitlements will be needed. What if @jart decides to just halt the process and read it's memory, because that'd be easier? We wouldn't know would we? It's all conjecture what is required because we don't know what's going to be done.
There might have been a misunderstanding in this conversation.
Sure, I do acknowledge that if you directly use macOS APIs like your attempt described in the first comment, there may be a possibility that entitlements are needed.
I also acknowledge that the approach I mentioned in the original post is just a suggestion. jart may use it, use your approach on macOS, or use a totally different one.
The misunderstanding here might have been that I was suggesting a POSIX-only approach for blink
, and that the project could implement something similar to what DarlingHQ docs said. When you mentioned "Doing this", you might have referred to your own attempt, which involves using macOS-specific privileged APIs, while I misunderstood it as the "cooperative debugging" approach itself (which only uses POSIX kill
).
Apart from some off-topic comments about darling
's kernel server here, all of my comments below compare blink
to darling
in a few aspects to note that it's possible for blink
to follow the "cooperative debugging" approach, and not focusing on how darling
works on Linux.
Again, I do acknowledge that translating ptrace
to macOS-specific APIs is a possible approach on the macOS side, and using the approach you attempted will have limitations that you stated. Sorry for having misunderstood your previous comments.
I'm still catching up on this thread. Wow a lot of lively discussion here. I'm so happy to see how passionate folks are about their visions for Blink's future. Please remember that we're all friends here, and that nothing is impossible. Please keep that in mind when writing your communications. Blink is very POSIX focused, but I don't want to be limited to POSIX. I'd be happy to see integration with Mach APIs happen if that makes Blink better. ptrace() is a powerful and cool Linux specific API that'd be great to have available on MacOS. The one Linux API that I think is even cooler is SECCOMP BPF, since I have real need of that, since it's what redbean uses to provide sandboxing on Linux. Debugging processes external to Blink has been less of a focus until now, since the motivation has been more geared towards improving the debuggability of things that run inside Blink.
@trungnt2910 I never saw your reply. I also apologize for the misunderstanding on my part. Stooping low (argumentatively) to prove a point was and will always be counterproductive.
One thing we discussed recently on Discord is that having ptrace could potentially pave the way for running GDB inside Blink. Obviously Blinkenlights is the native debugger experience that fully works inside and outside of Blink. But having GDB would be nice too. The canonical way to do GDB though would be to implement the GDB server protocol into Blink, so that GDB can connect via a TCP socket similar to what Qemu does.
The canonical way to do GDB though would be to implement the GDB server protocol into Blink
This is a very ad-hoc approach to provide debugging support for Blink. First, we have 1000 lines of code to bring strace
support natively on Blink. Then, we would implement an entire protocol specific to GDB.
If someone wants to use, for example, LLDB instead, or use some framework-specific debuggers like the .NET Core Debugger (vsdbg
), they would have to implement the same thing over and over again.
This syscall is crucial for debugging support on blink, which may aid development in many scenarios.
While Linux
ptrace
is quite a unique call, it can be emulated using a technique called "cooperative debugging", used by the Darling project to emulate macOSptrace
without having to actually rely on the host'sptrace
.The approach uses an internal signal, which I believe blink already does.