performancecopilot / xsos-pcp

Performance Co-Pilot xsos-inspired script for fast sosreport diagnostics
2 stars 3 forks source link

pcp-xsos shows a lot of errors when --all is used with -O option. #12

Closed Aniruddh94 closed 1 month ago

Aniruddh94 commented 3 months ago

Hello Team,

The tool fails for --all when used with -O, with a lot of "command not found" errors.

Archive details :
[Ani@rhel8 xsos-pcp]$ pcp -a /var/log/pcp/pmlogger/rhel8/20240606.0.xz
Performance Co-Pilot configuration on rhel8:
  archive: /var/log/pcp/pmlogger/rhel8/20240606.0.xz
 platform: Linux rhel8 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Mar 14 14:20:09 EDT 2024 x86_64
 hardware: 4 cpus, 1 disk, 1 node, 3653MB RAM
 timezone: EDT+4
 services: pmcd
     pmcd: Version 5.3.7-20, 9 agents, 4 clients
     pmda: root pmcd proc pmproxy xfs linux mmv kvm jbd2
 pmlogger: rhel8: /var/log/pcp/pmlogger/rhel8/20240606.00.58
     pmie: primary engine: /var/log/pcp/pmie/rhel8/pmie.log

[Ani@rhel8 xsos-pcp]$ pmdumplog -L -z /var/log/pcp/pmlogger/rhel8/20240606.0.xz
Note: timezone set to local timezone of host "rhel8" from archive
Log Label (Log Format Version 2)
Performance metrics from host rhel8
    commencing Thu Jun  6 00:58:30.625194 2024
    ending     Sun Jun  9 22:59:35.499956 2024
Archive timezone: EDT+4
PID for pmlogger: 3489

[Ani@rhel8 xsos-pcp]$ git pull
Already up to date.

[Ani@rhel8 xsos-pcp]$ pwd
/home/Ani/xsos-pcp

[Ani@rhel8 xsos-pcp]$ ./pcp-xsos -O 'Jun 8 10:00:00' -a /var/log/pcp/pmlogger/rhel8/20240606.0.xz --all
./pcp-xsos: line 343: kernel_uname_release:=kernel.uname.version: pmFetch: End of PCP archive log: command not found
OS
  Hostname: rhel8
  Distro:   unknown
  Arch:     platform=unknown
  Kernel:
    Hertz:         \033[0;33munknown
    Pagesize:      \033[0;33munknown
    Build version:
      unknown version unknown
      unknown build version2
unknown
    - - - - - - - - - - - - - - - - - - -
  Boot time: Wed Dec 31 07:00:00 pm EST 1969
  Time Zone: unknown [EDT+4]
  Uptime:    0 min,  0 users
  LoadAvg:  [0 CPU]./pcp-xsos: line 493: printf: unknown: invalid number
 0.00 (0%),./pcp-xsos: line 493: printf: unknown: invalid number
 0.00 (0%),./pcp-xsos: line 493: printf: unknown: invalid number
 0.00 (0%)
  Processes: 
    running: \033[0;33munknown    runnable: \033[0;33munknown    blocked: \033[0;33munknown    count: \033[0;33munknown
  Processors: 
    cpu [Utilization since boot]: 
      gawk: cmd. line:3: fatal: division by zero attempted
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10: kernel_uname_release:="kernel.uname.version: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11: kernel_uname_sysname:="kernel.uname.machine: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12: kernel_uname_nodename:="kernel.uname.distro: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12:                      ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ unterminated string
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10: kernel_uname_release:="kernel.uname.version: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11: kernel_uname_sysname:="kernel.uname.machine: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12: kernel_uname_nodename:="kernel.uname.distro: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12:                      ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ unterminated string
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10: kernel_uname_release:="kernel.uname.version: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11: kernel_uname_sysname:="kernel.uname.machine: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12: kernel_uname_nodename:="kernel.uname.distro: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12:                      ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ unterminated string
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ syntax error
SOCKSTAT
      sockets: used \033[0;33munknown
      TCP: inuse \033[0;33munknown orphan \033[0;33munknown tw \033[0;33munknown alloc \033[0;33munknown mem \033[0;33munknown
      UDP: inuse \033[0;33munknown mem \033[0;33munknown
      UDPLITE: inuse \033[0;33munknown
      RAW: inuse \033[0;33munknown
      FRAG: inuse \033[0;33munknown memory \033[0;33munknown
      TCP6: inuse \033[0;33munknown
      UDP6: inuse \033[0;33munknown
      UDPLITE6: inuse \033[0;33munknown
      RAW6: inuse \033[0;33munknown
      FRAG6: inuse \033[0;33munknown memory \033[0;33munknown
NET STATS
  Icmp.InErrors:              \033[0;33munknown
  Icmp6.InErrors:             \033[0;33munknown
  Tcp.AttemptFails:           \033[0;33munknown
  Tcp.EstabResets:            \033[0;33munknown
  Tcp.InErrs:                 \033[0;33munknown
  Tcp.OutRsts:                \033[0;33munknown
  TcpExt.DelayedACKLocked:    \033[0;33munknown
  TcpExt.DelayedACKLost:      \033[0;33munknown
  TcpExt.DelayedACKs:         \033[0;33munknown
  TcpExt.PAWSEstab:           \033[0;33munknown
  TcpExt.TCPAbortOnTimeout:   \033[0;33munknown
  TcpExt.TCPLossProbeRecovery: \033[0;33munknown
  TcpExt.TCPLossProbes:       \033[0;33munknown
  TcpExt.TCPTimeouts:         \033[0;33munknown
  TcpExt.TcpTimeoutRehash:    \033[0;33munknown
  Ip.InAddrErrors:            \033[0;33munknown
  Ip6.InAddrErrors:           \033[0;33munknown
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10: kernel_uname_release:="kernel.uname.version: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:10:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11: kernel_uname_sysname:="kernel.uname.machine: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:11:                     ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12: kernel_uname_nodename:="kernel.uname.distro: pmFetch: End of PCP archive log"
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:12:                      ^ syntax error
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ unterminated string
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482: proc_psinfo_psargs_value[10288]="/usr/bin/updatedb -f sysfs
gawk: /var/tmp/pcp-xsos.MAxSxz0fZ/metrics:1482:                                 ^ syntax error

It works fine without -O :

[Ani@rhel8 xsos-pcp]$ ./pcp-xsos -a /var/log/pcp/pmlogger/rhel8/20240606.0.xz --all
OS
  Hostname: rhel8
  Distro:   Red Hat Enterprise Linux release 8.9 (Ootpa)
  Arch:     platform=x86_64
  Kernel:
    Hertz:         100
    Pagesize:      4096
    Build version:
      Linux version 4.18.0-513.24.1.el8_9.x86_64
      #1 SMP Thu Mar 14 14:20:09 EDT 2024
    - - - - - - - - - - - - - - - - - - -
  Boot time: Thu Jun 06 12:27:31 am EDT 2024
  Time Zone: unknown [EDT+4]
  Uptime:    30 min,   1 user
  LoadAvg:  [4 CPU] 0.34 (1%), 0.08 (0%), 0.03 (0%)
  Processes: 
    running: 9    runnable: 9    blocked: 0    count: 197
  Processors: 
    cpu [Utilization since boot]: 
      us 0%, ni 0%, sys 0%, idle 99%, iowait 0%, irq 0%, sftirq 0%, steal 0%
MEMORY
  Stats graphed as percent of MemTotal:
    MemUsed    ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊.................................  33.9%
    Buffers    ..................................................   0.1%
    Cached     ▊▊▊▊▊▊▊▊▊.........................................  18.4%
    HugePages  ..................................................   0.0%
    Dirty      ..................................................   0.6%
    Available  ▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊▊..........  79.4%
  RAM:
    3.6 GiB total ram
    1.2 GiB (34%) used
    0.6 GiB (15%) used excluding Buffers/Cached
    0 GiB (1%) dirty
    2.8 GiB (79%) available
  HugePages:
    No ram pre-allocated to HugePages
  TransparentHugePages:
    0 GiB allocated to THP
    0 GiB (0%) of LowMem in-use
    0.11 GiB (3%) of total ram used for Slab
    0.01 GiB (0%) of total ram used for PageTables
    0.01 GiB (0%) of total ram used for Shmem
  Virtual Machine Balloon:
    No VM balloon memory target
  Swap:
    0 GiB (0%) used of 2 GiB total
STORAGE
  Whole Disks from /proc/partitions:
     disks, totalling 0 GiB (0 TiB)
    - - - - - - - - - - - - - - - - - - - - -
    Disk     Size in GiB
    ----     -----------
  Filesystem usage from df:
    Filesystem     1K-blocks     Used Available Use% Mounted on
    /dev/mapper/rhel-root       17811456                     69% /
    /dev/vda1        1038336                     30% /boot
NETDEV
  Interface  RxMiBytes  RxPackets  RxErrs  RxDrop  RxFifo  RxComp  RxFrame  RxMultCast
  =========  =========  =========  ======  ======  ======  ======  =======  ==========
  enp1s0     142        18 k       0       932(5%)  0       0       0        0         
  - - - - - - - - - - - - - - - - -
  Interface  TxMiBytes  TxPackets  TxErrs  TxDrop  TxFifo  TxComp  TxColls  TxCarrier
  =========  =========  =========  ======  ======  ======  ======  =======  ==========
  enp1s0     1          11 k       0       0(5%)   0       0       0        0         
SOCKSTAT
      sockets: used 170
      TCP: inuse 4 orphan 0 tw 0 alloc 7 mem 0
      UDP: inuse 2 mem 1
      UDPLITE: inuse 0
      RAW: inuse 0
      FRAG: inuse 0 memory 0
      TCP6: inuse 3
      UDP6: inuse 1
      UDPLITE6: inuse 0
      RAW6: inuse 1
      FRAG6: inuse 0 memory 0
NET STATS
  Icmp.InErrors:                              0
  Icmp6.InErrors:                             0
  Tcp.AttemptFails:                           2
  Tcp.EstabResets:                           10
  Tcp.InErrs:                                 0
  Tcp.OutRsts:                               47
  TcpExt.DelayedACKLocked:                    0
  TcpExt.DelayedACKLost:                     11
  TcpExt.DelayedACKs:                         3
  TcpExt.PAWSEstab:                           0
  TcpExt.TCPAbortOnTimeout:                   0
  TcpExt.TCPLossProbeRecovery:                0
  TcpExt.TCPLossProbes:                       1
  TcpExt.TCPTimeouts:                         0
  TcpExt.TcpTimeoutRehash:                    0
  Ip.InAddrErrors:                            1
  Ip6.InAddrErrors:                           0
PROCESSES
  Total number of threads/processes:
    353/166
  Top users of CPU & MEM:
    USER         %CPU   %MEM   RSS
    pcp         68.9%   1.1%   0.04 GiB
    root         0.6%  15.5%   0.55 GiB
    dbus           0%   0.2%   0.01 GiB
    polkitd        0%   0.7%   0.02 GiB
    chrony         0%   0.1%      0 GiB
  Uninterruptible sleep threads/processes (0/0)
    [None]
  Defunct zombie threads/processes (0/1)
    [None]
  Top CPU-using processes:
    USER      PID      %CPU   %MEM   VSZ-MiB  RSS-MiB  COMMAND
    pcp       3489     70.3%  0.2%   98       8        /usr/libexec/pcp/bin/pmlogger -N -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check %%Y%%m%%d.%%H.%%M
    pcp       2809     23.2%  0.2%   105      7        /usr/libexec/pcp/bin/pmcd -A
    root      1776     21.7%  8%     843      291      /usr/libexec/platform-python /usr/bin/dnf install pcp-zeroconf
    pcp       3446     2.3%   0.2%   97       7        /usr/bin/pmie -b -F -P -l /var/log/pcp/pmie/rhel8/pmie.log -c config.default
    root      2817     1.5%   0.2%   98       8        /var/lib/pcp/pmdas/linux/pmdalinux
    pcp       2814     0.8%   0.2%   98       7        /var/lib/pcp/pmdas/proc/pmdaproc -d 3
    root      858      0.1%   0.8%   401      29       /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
    root      1769     0.1%   0%     0        0        (kworker/0:0-events)
    root      1        0.1%   0.4%   173      15       /usr/lib/systemd/systemd --switched-root --system --deserialize 17
    root      1548     0%     0%     0        0        (kworker/0:3-events_power_efficient)
  Top MEM-using processes:
    USER      PID      %CPU   %MEM   VSZ-MiB  RSS-MiB  COMMAND
    root      1776     21.7%  8%     843      291      /usr/libexec/platform-python /usr/bin/dnf install pcp-zeroconf
    root      814      0%     1.4%   299      51       /usr/libexec/platform-python -s /usr/sbin/firewalld --nofork --nopid
    root      858      0.1%   0.8%   401      29       /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
    polkitd   900      0%     0.7%   1706     25       /usr/lib/polkit-1/polkitd --no-debug
    root      850      0%     0.5%   372      18       /usr/sbin/NetworkManager --no-daemon
    root      1        0.1%   0.4%   173      15       /usr/lib/systemd/systemd --switched-root --system --deserialize 17
    root      696      0%     0.3%   97       10       /usr/lib/systemd/systemd-udevd
    root      1481     0%     0.3%   134      10       sshd: root [priv]
    root      815      0%     0.3%   81       9        /usr/lib/systemd/systemd-logind
    root      1487     0%     0.2%   87       9        /usr/lib/systemd/systemd --user
  Top thread-spawning processes:
    #   USER      PID      %CPU   %MEM   VSZ-MiB  RSS-MiB  COMMAND
    8   polkitd   900      0%     0.7%   1706     25       /usr/lib/polkit-1/polkitd --no-debug
    4   root      858      0.1%   0.8%   401      29       /usr/libexec/platform-python -Es /usr/sbin/tuned -l -P
    3   root      781      0%     0.1%   128      2        /sbin/auditd
    3   root      850      0%     0.5%   372      18       /usr/sbin/NetworkManager --no-daemon
    3   root      1020     0%     0.2%   211      6        /usr/sbin/rsyslogd -n
    2   root      808      0%     0.1%   132      5        /usr/bin/qemu-ga --method=virtio-serial --path=/dev/virtio-ports/org.qemu.guest_agent.0 --blacklist=guest-file-open,guest-file-close,guest-file-read,guest-file-write,guest-file-seek,guest-file-flush,guest-exec,guest-exec-status -F/etc/qemu-ga/fsfreeze-hook
    2   dbus      813      0%     0.2%   63       6        /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
    2   root      814      0%     1.4%   299      51       /usr/libexec/platform-python -s /usr/sbin/firewalld --nofork --nopid
    2   root      865      0%     0.1%   113      2        /usr/bin/rhsmcertd
    2   root      941      0%     0.1%   122      5        /usr/sbin/irqbalance --foreground
natoscott commented 2 months ago

@Aniruddh94 there is a bug here in that we are not handling very well the case of catastrophic failure.

However, I think what you're wanting to do is run with "-O @Jun 8 10:00:00" which means 'use the absolute date', otherwise its taken as an offset from the start of the archive. Which in this case (without '@') becomes a long way past the archive end. See pcpintro(1) for details.