iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.36k stars 3.86k forks source link

add workqueue latency observation tool #4897

Closed jackygam2001 closed 7 months ago

jackygam2001 commented 8 months ago

add tool to observe work's waiting latency on kernel's workqueue

jackygam2001 commented 7 months ago

I think this is a useful tool. Sometimes during production we do wonder whether workqueue latency is too long which amplifies the chance for race condition, etc.

Two things: First, could you add the tool with brief description into README.md (CPU and Scheduler Tools) Second, sometimes we want to identify which original task is enqueued which caused long latency. So it might be useful to add another distribution factor (tid (together with its comm)).

What do you think?

thanks for your suggestions, and I don't fully understand your second suggestion, do you mean adding a '-T TID' option to filter the workqueue's source thread ID for this tool?

yonghong-song commented 7 months ago

thanks for your suggestions, and I don't fully understand your second suggestion, do you mean adding a '-T TID' option to filter the workqueue's source thread ID for this tool?

Currently we have '-W' option to print histogram based on different workqueues. So we have

   workqueue1:
      <histogram>
   workqueue2:
     <histogram>
   ...

What I means is to add '-P' option to print histogram based on PIDs. (I think PID (process id) granularity is good enough).

   pid1:
      <histogram>
   pid2:
      <histogram>

If both -W and -P are specified, we can have

   workqueue1:
      pid1:
         <histogram>
      pid2:
         <histogram>
   workqueue2:
      pid3:
          <histogram>
      pid4:
          <histogram>
   ...

Similar to '-W', '-p ' is also supported, so we only care a particular pid. So we have support of the following combinations:

   . -W && -P
   . -W && -p <pid>
   . -w <workqueue> && -P
   . -w <workqueue> && -p <pid>

What do you think?

jackygam2001 commented 7 months ago

thanks for your suggestions, and I don't fully understand your second suggestion, do you mean adding a '-T TID' option to filter the workqueue's source thread ID for this tool?

Currently we have '-W' option to print histogram based on different workqueues. So we have

   workqueue1:
      <histogram>
   workqueue2:
     <histogram>
   ...

What I means is to add '-P' option to print histogram based on PIDs. (I think PID (process id) granularity is good enough).

   pid1:
      <histogram>
   pid2:
      <histogram>

If both -W and -P are specified, we can have

   workqueue1:
      pid1:
         <histogram>
      pid2:
         <histogram>
   workqueue2:
      pid3:
          <histogram>
      pid4:
          <histogram>
   ...

Similar to '-W', '-p ' is also supported, so we only care a particular pid. So we have support of the following combinations:

   . -W && -P
   . -W && -p <pid>
   . -w <workqueue> && -P
   . -w <workqueue> && -p <pid>

What do you think?

-P option you mentioned in some cases does not make senses since kernel may commit the work to workqueue in interrupt context; and in the case histogram base on PID is not the real process which commit the work, right?

yonghong-song commented 7 months ago

-P option you mentioned in some cases does not make senses since kernel may commit the work to workqueue in interrupt context; and in the case histogram base on PID is not the real process which commit the work, right?

Good point. I guess let us not do this PID thing now. I will do some investigation about different percentages from process context, softirq context, or others. I agree if majority is not from process context, then PID histogram probably not useful. Even if quite some workqueue works from process context, we might still want to filter out those not from process context.