iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.36k stars 3.86k forks source link

offwaketime does not show wakers of Java threads #2487

Open yoheiueda opened 5 years ago

yoheiueda commented 5 years ago

When I profile a Java workload with offwaketime, wakers of Java threads become empty.

Component Version
BCC tools 0.10.0-1 (Upstream Ubuntu packages from repo.iovisor.org)
OS Ubuntu 18.04.2 (Linux 4.15.0-43-generic)
Java openjdk 12.0.1 2019-04-16
OpenJDK Runtime Environment AdoptOpenJDK (build 12.0.1+12)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 12.0.1+12, mixed mode, sharing)
perf-map-agent Latest version (d9843a0)
https://github.com/jvm-profiling-tools/perf-map-agent

Below is the example pair of target and waker. The target contains frames of JIT-compiled code, but the waker is empty. Even frames in kernel are not included in the waker.

    waker:            10904
    --               --
    finish_task_switch
    schedule
    exit_to_usermode_loop
    prepare_exit_to_usermode
    swapgs_restore_regs_and_return_to_usermode
    ObjectSynchronizer::inflate(Thread*, oopDesc*, ObjectSynchronizer::InflateCause)
    SharedRuntime::complete_monitor_locking_C(oopDesc*, BasicLock*, JavaThread*)
    _complete_monitor_locking_Java
    Lcom/ibm/websphere/jaxrs/server/IBMRestServlet;::doPost
    Lcom/ibm/ws/webcontainer/servlet/ServletWrapper;::service
    Lcom/ibm/ws/webcontainer/servlet/ServletWrapper;::handleRequest
    Lcom/ibm/ws/webcontainer/filter/WebAppFilterChain;::invokeTarget
    Lcom/ibm/ws/webcontainer/filter/WebAppFilterChain;::doFilter
    Lcom/ibm/ws/webcontainer/filter/FilterInstanceWrapper;::doFilter
    Lcom/ibm/ws/webcontainer/filter/WebAppFilterChain;::doFilter
    Lcom/ibm/ws/webcontainer/filter/WebAppFilterManager;::doFilter
    Lcom/ibm/ws/webcontainer/filter/WebAppFilterManager;::invokeFilters
    Lcom/ibm/ws/webcontainer/servlet/CacheServletWrapper;::handleRequest
    Lcom/ibm/ws/webcontainer/WebContainer;::handleRequest
    Lcom/ibm/ws/webcontainer/osgi/DynamicVirtualHost$2;::run
    Lcom/ibm/ws/http/dispatcher/internal/channel/HttpDispatcherLink;::ready
    Lcom/ibm/ws/http/channel/internal/inbound/HttpInboundLink;::handleDiscrimination
    Lcom/ibm/ws/http/channel/internal/inbound/HttpICLReadCallback;::complete
    Lcom/ibm/ws/tcpchannel/internal/WorkQueueManager;::attemptIO
    Lcom/ibm/ws/tcpchannel/internal/WorkQueueManager;::workerRun
    Ljava/util/concurrent/ThreadPoolExecutor;::runWorker
    Ljava/util/concurrent/ThreadPoolExecutor$Worker;::run
    Ljava/lang/Thread;::run
    call_stub
    JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)
    JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)
    thread_entry(JavaThread*, Thread*)
    JavaThread::run()
    Thread::call_run()
    thread_native_entry(Thread*)
    start_thread
    target:          Default Executo 0
        20

Below is another pair of waker and target in the same profile result. In this case, both waker and target frames are recorded, but the waker is not a Java thread. It looks like target only contains kernel frames.

    waker:           VM Thread 11644
    start_thread
    thread_native_entry(Thread*)
    Thread::call_run()
    VMThread::run()
    VMThread::loop()
    SafepointSynchronize::begin()
    SafepointSynchronize::do_cleanup_tasks()
    WorkGang::run_task(AbstractGangTask*)
    SemaphoreGangTaskDispatcher::coordinator_execute_on_workers(AbstractGangTask*, unsigned int)
    entry_SYSCALL_64_after_hwframe
    do_syscall_64
    sys_futex
    do_futex
    futex_wake
    --               --
    finish_task_switch
    schedule
    futex_wait_queue_me
    futex_wait
    do_futex
    sys_futex
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    do_futex_wait.constprop.1
    target:          GC Thread#6 11619
        2618955

I am not sure this issue is specific to Java or not. Could anyone please advise how to diagnose this issue?

yonghong-song commented 5 years ago

You can look at the source to find what is going on :-). For the first one,

    waker:            10904
    --               --
    finish_task_switch

It looks like we do get a non-zero waker pid, but not stack at all. We did not print out "[Missed User Stack]" or "[Missed Kernel Stack]". That means stack id is a valid one. Maybe somehow kernel did get the stack? Could you help print out the stack id in this case?

For your second example missing user stack, again, looks like stack id is valid. Could you help confirm?

Could you check whether the user space applications (i.e., you java app.) has frame pointer support? If it does not the kernel may have a hard time to find a user stack.

yoheiueda commented 5 years ago

Thank you for your comment.

I noticed that offwaketime shows target and waker PIDs inversely.

https://github.com/iovisor/bcc/blob/v0.10.0/tools/offwaketime.py#L353

        print("    %-16s %s %s" % ("waker:", k.waker.decode('utf-8', 'replace'), k.t_pid))

I think k.t_pid should be k.w_pid.

Likewise, this line should be corrected. https://github.com/iovisor/bcc/blob/v0.10.0/tools/offwaketime.py#L386

So, waker PID in the first example was actually zero. Waker is not recorded for some reason.

My next question is what causes this situation. What kind of reasons are possible to explain this situation?

I think involuntary context switches due to preemption may be one of the reasons. Are involuntary context switches recorded as waker PID=0?

yonghong-song commented 5 years ago

Thanks for the fix. For missing waker stack, I do have something like

    waker:            0
    --               --
    finish_task_switch
    __sched_text_start
    schedule_idle
    do_idle
    cpu_startup_entry
    start_secondary
    secondary_startup_64
    target:          swapper/12 0
        36893488147420673

For non idle thread, looks like I am able to get waker information.

The waker info is gathered by kprobe to try_to_wake_up which is a static funciton. Could you check in your environment that try_to_wake_up is inlined in some places?

yonghong-song commented 5 years ago

I think involuntary context switches due to preemption may be one of the reasons. Are involuntary context switches recorded as waker PID=0

Why do you think involuntary context switch could be one of reasons? Any reference? pid=0 is reserved for kernel scheduler thread. It can be the waker, e.g.

    waker:           swapper/49 0
    secondary_startup_64
    start_secondary
    cpu_startup_entry
    do_idle
    cpuidle_enter
    cpuidle_enter_state
    apic_timer_interrupt
    smp_apic_timer_interrupt
    hrtimer_interrupt
    __hrtimer_run_queues
    hrtimer_wakeup
    --               --
    finish_task_switch
    __sched_text_start
    schedule
    schedule_hrtimeout_range_clock
    poll_schedule_timeout.constprop.14
    do_select
    core_sys_select
    kern_select
    __x64_sys_select
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    [Missed User Stack]
    target:          offwaketime.py 937576
yoheiueda commented 5 years ago

@yonghong-song Thank you very much for clarifying the meaning of PID=0.

Could you please tell me how to check that try_to_wake_up is inlined or not?

Should I rebuild the kernel to disable inlining of try_to_wake_up by adding noinline attribute?

yonghong-song commented 5 years ago

You need to look at assembly code. Find where try_to_wake_up is called in the assembly code. Compare its call site to the place in the source code. And then you can figure out whether it is inlined or not in a particular call site.

yoheiueda commented 5 years ago

I rebuilt the kernel with inlining of try_to_wake_up disabled as follows.

static noinline int
try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)

However, the situation did not change.

Is there any other possibility that causes empty kernel stack for PID=0?

yonghong-song commented 5 years ago

stack id 0 is valid in the kernel, but is ignored in offwaketime.py. Maybe you can try the following patch solves your problem?

diff --git a/tools/offwaketime.py b/tools/offwaketime.py
index b46e9e1b..6a8eb6c8 100755
--- a/tools/offwaketime.py
+++ b/tools/offwaketime.py
@@ -210,6 +210,10 @@ int oncpu(struct pt_regs *ctx, struct task_struct *p) {
         key.w_tgid = woke->w_tgid;
         __builtin_memcpy(&key.waker, woke->name, TASK_COMM_LEN);
         wokeby.delete(&pid);
+    } else {
+        /* waker stack not available, stack id 0 is a valid stack id in the kernel */
+        key.w_k_stack_id = -EFAULT;
+        key.w_u_stack_id = -EFAULT;
     }

     counts.increment(key, delta);
@@ -307,13 +311,13 @@ for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
         has_enomem = has_enomem or (k.w_u_stack_id == -errno.ENOMEM) or \
                      (k.t_u_stack_id == -errno.ENOMEM)

-    waker_user_stack = [] if k.w_u_stack_id < 1 else \
+    waker_user_stack = [] if k.w_u_stack_id < 0 else \
         reversed(list(stack_traces.walk(k.w_u_stack_id))[1:])
-    waker_kernel_stack = [] if k.w_k_stack_id < 1 else \
+    waker_kernel_stack = [] if k.w_k_stack_id < 0 else \
         reversed(list(stack_traces.walk(k.w_k_stack_id))[1:])
-    target_user_stack = [] if k.t_u_stack_id < 1 else \
+    target_user_stack = [] if k.t_u_stack_id < 0 else \
         stack_traces.walk(k.t_u_stack_id)
-    target_kernel_stack = [] if k.t_k_stack_id < 1 else \
+    target_kernel_stack = [] if k.t_k_stack_id < 0 else \
         stack_traces.walk(k.t_k_stack_id)

     if folded:
@@ -326,7 +330,7 @@ for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
                 line.extend([b.sym(addr, k.t_tgid).decode('utf-8', 'replace')
                     for addr in reversed(list(target_user_stack)[1:])])
         if not args.user_stacks_only:
-            line.extend(["-"] if (need_delimiter and k.t_k_stack_id > 0 and k.t_u_stack_id > 0) else [])
+            line.extend(["-"] if (need_delimiter and k.t_k_stack_id >= 0 and k.t_u_stack_id >= 0) else [])
             if stack_id_err(k.t_k_stack_id):
                 line.append("[Missed Kernel Stack]")
             else:
@@ -340,7 +344,7 @@ for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
                 line.extend([b.ksym(addr).decode('utf-8', 'replace')
                     for addr in reversed(list(waker_kernel_stack))])
         if not args.kernel_stacks_only:
-            line.extend(["-"] if (need_delimiter and k.w_u_stack_id > 0 and k.w_k_stack_id > 0) else [])
+            line.extend(["-"] if (need_delimiter and k.w_u_stack_id >= 0 and k.w_k_stack_id >= 0) else [])
             if stack_id_err(k.w_u_stack_id):
                 line.append("[Missed User Stack]")
             else:
@@ -358,7 +362,7 @@ for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
                 for addr in waker_user_stack:
                     print("    %s" % b.sym(addr, k.w_tgid))
         if not args.user_stacks_only:
-            if need_delimiter and k.w_u_stack_id > 0 and k.w_k_stack_id > 0:
+            if need_delimiter and k.w_u_stack_id >= 0 and k.w_k_stack_id >= 0:
                 print("    -")
             if stack_id_err(k.w_k_stack_id):
                 print("    [Missed Kernel Stack]")
@@ -376,7 +380,7 @@ for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
                 for addr in target_kernel_stack:
                     print("    %s" % b.ksym(addr))
         if not args.kernel_stacks_only:
-            if need_delimiter and k.t_u_stack_id > 0 and k.t_k_stack_id > 0:
+            if need_delimiter and k.t_u_stack_id >= 0 and k.t_k_stack_id >= 0:
                 print("    -")
             if stack_id_err(k.t_u_stack_id):
                 print("    [Missed User Stack]")

If with the above patch, you still have problems, could you post a new stack trace missing waker stack and the way how to reproduce it? Maybe we should probe other kernel functions as well to get complete sleep/wake pictures.