sharkcz / collectl

git mirror of collectl releases
Other
5 stars 4 forks source link

New issue reported regarding interrupts, reported by Vishal Agrawal. I will review as soon as I get a chance #3

Open loberman opened 1 month ago

loberman commented 1 month ago
# cat /proc/cmdline
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-513.9.1.el8_9.x86_64 root=/dev/mapper/rhel_dhcp182--241-root ro crashkernel=auto resume=/dev/mapper/rhel_dhcp182--241-swap [rd.lvm.lv](http://rd.lvm.lv/)=rhel_dhcp182-241/root [rd.lvm.lv](http://rd.lvm.lv/)=rhel_dhcp182-241/swap skew_tick=1 tsc=reliable rcupdate.rcu_normal_after_boot=1 nohz=on nohz_full=2-7,10-15,18-23 rcu_nocbs=2-7,10-15,18-23 tuned.non_isolcpus=00030303 intel_pstate=disable nosoftlockup console=tty0 console=ttyS0,115200

# collectl -sjJ -i 2 --intfilt LOC
waiting for 2 second sample...

### RECORD    1 >>> dell-r430-11 <<< (1706859189.001) (Fri Feb  2 02:33:09 2024) ###

# INTERRUPT SUMMARY
#    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23
     2003     23      0      0      0      0      0      0     50      7      0      0      0      0      0      0     31      2      0      0      0      0      0      0

# INTERRUPT DETAILS
# Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23   Type            Device(s)
  LOC    1001     11      0      0      0      0      0      0     25      3      0      0      0      0      0      0     15      1      0      0      0      0      0      0   Localtimer

### RECORD    2 >>> dell-r430-11 <<< (1706859191.001) (Fri Feb  2 02:33:11 2024) ###

# INTERRUPT SUMMARY
#    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23
     2002     24      0      0      0      0      0      0     68      9      0      0      0      0      0      0     18     31      0      0      0      0      0      0

# INTERRUPT DETAILS
# Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23   Type            Device(s)
  LOC    1001     12      0      0      0      0      0      0     34      4      0      0      0      0      0      0      9     15      0      0      0      0      0      0   Localtimer
~~~

In above command I filtered using LOC just to keep my query simple. I kept interval as 2sec, but what I see is in Detailed interrupt output, LOC shows delta of 1 sec only
while summary shows delta of 2sec. By default 1000 interrupts are generated for LOC per sec due to CONFIG_HZ=1000, so summary value around 2000 for cpu0 is correct, but
DETAIL interrupts show LOC of 1000 only, this is with every irq. Let me show sample without --intfilt.

~~~
# collectl -sjJ -i 5
waiting for 5 second sample...

### RECORD    1 >>> dell-r430-11 <<< (1706859453.001) (Fri Feb  2 02:37:33 2024) ###

# INTERRUPT SUMMARY
#    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23
     5046     74      7      7      7      7      7      7    161     20      7      7      7      7      7      7     73     36      7      7      7      7      7      7

# INTERRUPT DETAILS
# Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23   Type            Device(s)
  080       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572864-edgeeno3-tx-0
  081       7      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572865-edgeeno3-rx-1
  082       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572866-edgeeno3-rx-2
  083       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572867-edgeeno3-rx-3
  084       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572868-edgeeno3-rx-4
  CAL       1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      0      1      1      1      1      1      1      1   Functioncall
  LOC    1000     13      0      0      0      0      0      0     29      2      0      0      0      0      0      0     13      5      0      0      0      0      0      0   Localtimer

### RECORD    2 >>> dell-r430-11 <<< (1706859458.001) (Fri Feb  2 02:37:38 2024) ###

# INTERRUPT SUMMARY
#    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23
     5077     81     13     13     13     13     13     13    209     44     13     13     13     13     13     13    153     20     13     13     13     13     13     13

# INTERRUPT DETAILS
# Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19  Cpu20  Cpu21  Cpu22  Cpu23   Type            Device(s)
  040       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      524293-edgemegasas0-msix5
  080       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      2      0      0      0      0      0      0      0   IR-PCI-MSI      1572864-edgeeno3-tx-0
  081      12      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572865-edgeeno3-rx-1
  082       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572866-edgeeno3-rx-2
  083       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572867-edgeeno3-rx-3
  084       0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0      0   IR-PCI-MSI      1572868-edgeeno3-rx-4
  CAL       2      3      2      2      2      2      2      2      2      2      2      2      2      2      2      2      0      3      2      2      2      2      2      2   Functioncall
  LOC    1000     13      0      0      0      0      0      0     37      5      0      0      0      0      0      0     27      1      0      0      0      0      0      0   Localtimer
~~~

Is it expected behavior with DETAILED output that it shows delta of 1sec only even when interval is greater than 1sec?

- Let me know your thoughts.

- Vishal A.
loberman commented 1 month ago
I reproduced this and I am looking into it.

If I use 
collectl -sjJ -oT -i 1 --intfilt LOC

# INTERRUPT SUMMARY
#Time         Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19
10:50:46      4093      7    999    116    513   1256   1301      1   2037     72    488      2    108    118     27      2     16     20     14    978

# INTERRUPT DETAILS
#Time      Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19   Type            Device(s)
10:50:46   LOC    4093      7    999    116    513   1256   1301      1   2037     72    488      2    108    118     27      2     16     20     14    978   Localtimer

If however I use a 2s interval I only see half so it sampling just Like Vishal said at 1/2   the time (1s)

# INTERRUPT SUMMARY
#Time         Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19
10:52:10       810      2    292    291    223      3    211      2   1951      0    154      4     66     51     47     13      6      3     19     15

# INTERRUPT DETAILS
#Time      Int    Cpu0   Cpu1   Cpu2   Cpu3   Cpu4   Cpu5   Cpu6   Cpu7   Cpu8   Cpu9  Cpu10  Cpu11  Cpu12  Cpu13  Cpu14  Cpu15  Cpu16  Cpu17  Cpu18  Cpu19   Type            Device(s)
10:52:10   LOC     405      1    146    145    111      1    105      1    975      0     77      2     33     25     23      6      3      1      9      7   Localtimer

Will see why it is doing this.

Again, Vishal G, thanks for reporting this
loberman commented 3 weeks ago

I am still working this as I get time, its not a show stopper so customer facing work gets done first