tikv / pprof-rs

A Rust CPU profiler implemented with the help of backtrace-rs
Apache License 2.0
1.32k stars 104 forks source link

Windows support #9

Open cretz opened 4 years ago

cretz commented 4 years ago

Just tossing this issue out here for others that may reach this lib. Build fails since pthread_self and pthread_getname_np are, predictably, not present. Can std::thread::current and std::thread::Thread.name be used instead? Would it even matter with how signals work here?

YangKeao commented 4 years ago

We cannot use std::thread::current here. Because a thread_local RefCell is used in std::thread::current to get current THREAD_INFO, it's not safe to use in signal handler.

For compiling and using this lib in windows, I will remove the usage of thread name (when it is used in windows).

I know little about windows programming. Do you have any suggestion on how to identify threads on windows? Do it have thread name or something similar?

YangKeao commented 4 years ago

I will remove the usage of thread name (when it is used in windows).

This has been done in #15 . Can you help me test on windows?

cretz commented 4 years ago

Awesome. I may not be able to test for a while (moved off of Rust project for a bit), but will try to soon...

YangKeao commented 4 years ago

@cretz Sorry, it still cannot work on windows, because we use setitimer syscall to send signal, interrupt and sample. (And the signal handler is also set by nix crate which definitely cannot work on windows)

LK4D4 commented 4 years ago

Hello, I can help with testing (and maybe development) on Windows and interested in decent rust profiler :) Right now from what I see one major problem is nix crate and second problem is lack of pthread stuff in libc crate:

error[E0412]: cannot find type `pthread_t` in crate `libc`
  --> C:\Users\lk4d4\.cargo\git\checkouts\pprof-rs-e0fcf6c5ee7a42e1\de4c986\src\profiler.rs:79:53
   |
79 | fn write_thread_name_fallback(current_thread: libc::pthread_t, name: &mut [libc::c_char]) {
   |                                                     ^^^^^^^^^ not found in `libc`

error[E0412]: cannot find type `pthread_t` in crate `libc`
   --> C:\Users\lk4d4\.cargo\git\checkouts\pprof-rs-e0fcf6c5ee7a42e1\de4c986\src\profiler.rs:105:44
    |
105 | fn write_thread_name(current_thread: libc::pthread_t, name: &mut [libc::c_char]) {
    |                                            ^^^^^^^^^ not found in `libc`

error[E0425]: cannot find function `pthread_self` in crate `libc`
   --> C:\Users\lk4d4\.cargo\git\checkouts\pprof-rs-e0fcf6c5ee7a42e1\de4c986\src\profiler.rs:139:49
    |
139 |             let current_thread = unsafe { libc::pthread_self() };
    |                                                 ^^^^^^^^^^^^ not found in `libc`

Let me know if I can help. Thanks!

LK4D4 commented 4 years ago

So, I did a little research and looks like it isn't that easy, but doable. Gonna leave it here so I won't forget or someone has more time than me. Windows doesn't support SIGPROF and I didn't find any analog. There is go implementation of a similar thing https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L1016-L1112 It just starts a separate thread that wakes up from time to time and iterates all other threads in the process one by one, suspends them, gets context, process, resumes. In theory, we could implement the same thing with https://docs.rs/winapi/0.3.9/winapi/um/synchapi/fn.SetWaitableTimer.html and https://docs.rs/winproc/0.6.4/winproc/struct.Process.html#method.thread_ids. Not sure about the details of getting backtraces, it probably won't be supported by backtrace-rs, but rather will have a custom implementation.

YangKeao commented 4 years ago

@LK4D4 Cool! Thanks for your research :beers: .

It's a good start for someone (like me) to consider about the implementation on windows :smiley_cat: .

sunng87 commented 4 years ago

Before we can get it work on Windows, is it possible to at least compile on Windows? I received report that my tests doesn't compile on Windows because I use pprof in my benchmark code.

LK4D4 commented 4 years ago

FWIW I think compilation on windows should be easy, I actually got it to compile, but not working. I'll try to find my code and send PR.

YangKeao commented 4 years ago

@sunng87 I don't think "compile but not work" is a good choice. You could use conditional compilation and platform specific dependencies to avoid pprof-rs on windows target.

sunng87 commented 4 years ago

@YangKeao Thank you! That's better solution.

Jardynq commented 2 years ago

Bump. I have tried some of implementing @LK4D4 ideas, and it seems to work? I haven't tried using the SetWaitableTimer, but instead i create a worker thread that enumerates all thread ids and uses a modified backtrace-rs, that allows backtracing of threads other than the calling thread, by passing a handle.

Though I don't know if the backtrace is correct.

The following code generates the following backtrace on windows 11 build 22489.1000 Built with nigthly-x86_64-pc-windows-msvc v. 1.61.0

fn main() {
    let guard = pprof::ProfilerGuardBuilder::default().frequency(1000).build().unwrap();

    for i in 0..1000000 {
        if i % 100 == 0 {
            println!("abebebb");
        }
    }

    if let Ok(report) = guard.report().build() {
        let file = std::fs::File::create("flamegraph.svg").unwrap();
        report.flamegraph(file).unwrap();
    };
}

flamegraph flamegraph.svg.txt

I haven't bothered with naming the threads, so the name is just their id. 43dc is the main thread. The worker thread filters itself away, so it's not shown. Can anyone verify that this stacktrace makes sense?

aminya commented 2 years ago

@Jardynq Could you send a PR for us to try?

Jardynq commented 2 years ago

Sorry for the late reply. Got a throat infection that knocked me out.

After tidying up the code for a PR, i encountered some bugs that made me experiment with a few other methods:

SetWaitableTimer From my understanding this is effectively a wrapper around QueueUserAPC.

QueueUserAPC This works by queuing a callback to be called from a specific thread by the scheduler when the thread is in an alertable state. We can have a callback that simply reads the backtrace, similar to how it's currently done on linux. The issue with this, is that the alertable state is only achieved in specific circumstances. Such as waiting for synchronous file IO. This is not good enough.

QueueUserAPC2 Luckily a second method exists that will try to execute as soon as possible. A small pitfall with this, is that it does not work on WoW64. I tried making an implementation, but the QueueUserAPC2 would add a couple of functions to the callstack. Of course, these could be filtered out, but sometimes the function would completely corrupt the callstack. This happened about 33% of the time (counted by eye). It has another issue in not being synchronized. This is the same issue that suspend thread has.

Manually: SuspendThread, GetThreadContext, ResumeThread This was the original implementation as described in my first comment. The reason i went back and forth on this was for a couple of reasons. Firstly the issue with suspend thread. If a thread has access to a synchronization object (which windows does a lot under the hood) while being suspended. Then if the main thread tries to access that object, it will end in a deadlock. This happened rarely (maybe 1%), but often enough that it was an issue. Luckily this seems to be fixed by simply ensuring that the main thread doesn't do anything that would require synchronization. Currently the forked implementation of backtrace calls the three functions named above, directly after eachother. This means that the child thread is only suspended while main is calling GetThreadContext and ResumeThread. These should hopefully not end in a deadlock. Another issue which can actually be seen in my comment above, is that a thread's backtrace does not always start at the same place. In other words, all backtraces should start with RtlUserThreadStart since that is the entry point. But sometimes the callback does not contain this. This happens a minority of the time and therefore the widest callstack is the correct one while the thin ones are "corrupted"?. But this is still a slight annoyance and could confuse someone trying to decipher the data. A possible solution would be, to filter away all the backtraces that don't end in RtlUserThreadStart since something must've gone wrong with them. At the same time notify the user how many samples from a given thread where bad, so they can use it in their evaluation.

From my testing it seems that the manual method is the best and most consistent.

forked backtrace-rs PR-issue

aminya commented 2 years ago

I opened a pull request for backtrace-rs. This should make it possible to rework #122. https://github.com/rust-lang/backtrace-rs/pull/473

maksymsur commented 2 years ago

support this initiative as well. having win support would be great! at the bare minimum to be able to compile at least... which is super useful for a cross-platform development

aminya commented 2 years ago

See https://github.com/rust-lang/backtrace-rs/pull/473 The Rust developers have asked me to copy-paste code to support profiling on Windows. I think that's against software reuse and maintainability fundamentals. Maybe you can review that pull request and add your opinion.

Jardynq commented 2 years ago

@aminya I think the correct solution is to fork backtrace-rs. Luckily, it isn't updated often and the changes we need are so minor, that should be able to keep the fork up to date with upstream without any merge conflicts. I have updated my fork to fix an issue pointed out by alexcrichton.

YangKeao commented 2 years ago

I think the correct solution is to fork backtrace-rs

I think so. The backtrace-rs has a lot of helpful things which are not exported. Actually I can understand the developers of backtrace-rs don't want to export / abstract these logic, as it will increase the complexity and make it harder to keep compatibility. We can reuse these logic through forking backtrace-rs.

You'll also need to give the forked backtrace-rs a new name, and publish it to the crates.io, as it's not allowed to use a git dependency for a published crate (pprof-rs)

aminya commented 2 years ago

Yeah, I think forking is a better solution than copy-pasting code. I can call the fork something like backtrace-thread.

@Jardynq I will merge the changes into my branch and publish it. I made some changes to your original fork as it was not ready for a PR and had some issues.

maksymsur commented 2 years ago

also support a forking solution 👍