pafernanr / sarcharts

Generates dynamic charts from sar files
GNU General Public License v3.0
1 stars 2 forks source link

embedded linux kernel option results in missing activities => IndexError #40

Closed sarnold closed 6 months ago

sarnold commented 6 months ago

Sarcharts works perfectly on newer kernels, and appears much more likely to work with a "distro" config on older kernels. With a more targeted embedded config, the CPU activities are missing 3 "header" sections from the normal output.

With a yocto-ish config of linux-raspberrypi, the default sa1 data file is missing some CPU-related activities:

$ grep hostname data/sa09.csv 
# hostname;interval;timestamp;CPU;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%gnice;%idle
# hostname;interval;timestamp;proc/s;cswch/s
# hostname;interval;timestamp;pswpin/s;pswpout/s
# hostname;interval;timestamp;pgpgin/s;pgpgout/s;fault/s;majflt/s;pgfree/s;pgscank/s;pgscand/s;pgsteal/s;%vmeff
# hostname;interval;timestamp;tps;rtps;wtps;dtps;bread/s;bwrtn/s;bdscd/s
# hostname;interval;timestamp;kbmemfree;kbavail;kbmemused;%memused;kbbuffers;kbcached;kbcommit;%commit;kbactive;kbinact;kbdirty;kbanonpg;kbslab;kbkstack;kbpgtbl;kbvmused
# hostname;interval;timestamp;kbswpfree;kbswpused;%swpused;kbswpcad;%swpcad
# hostname;interval;timestamp;kbhugfree;kbhugused;%hugused;kbhugrsvd;kbhugsurp
# hostname;interval;timestamp;dentunusd;file-nr;inode-nr;pty-nr
# hostname;interval;timestamp;runq-sz;plist-sz;ldavg-1;ldavg-5;ldavg-15;blocked
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
# hostname;interval;timestamp;IFACE;rxerr/s;txerr/s;coll/s;rxdrop/s;txdrop/s;txcarr/s;rxfram/s;rxfifo/s;txfifo/s
# hostname;interval;timestamp;call/s;retrans/s;read/s;write/s;access/s;getatt/s
# hostname;interval;timestamp;scall/s;badcall/s;packet/s;udp/s;tcp/s;hit/s;miss/s;sread/s;swrite/s;saccess/s;sgetatt/s
# hostname;interval;timestamp;totsck;tcpsck;udpsck;rawsck;ip-frag;tcp-tw
# hostname;interval;timestamp;CPU;total/s;dropd/s;squeezd/s;rx_rps/s;flw_lim/s;blg_len

With a 6.x kernel and distro config you get three more activity sections:

...
# hostname;interval;timestamp;%scpu-10;%scpu-60;%scpu-300;%scpu
# hostname;interval;timestamp;%sio-10;%sio-60;%sio-300;%sio;%fio-10;%fio-60;%fio-300;%fio
# hostname;interval;timestamp;%smem-10;%smem-60;%smem-300;%smem;%fmem-10;%fmem-60;%fmem-300;%fmem

Note there are many active yocto/oe branches still using 5.x (or even 4.x) kernels which is where I hit this issue, eg, rpi-64 on kirkstone branch has 5.15.92-v8 where I tried setting CONFIG_EMBEDDED=n with no changes in the result.

Sarcharts does work as expected on several devices running Gentoo 6.x dist-kernel, eg, Rockchip nanopi-r5c, Lenovo x13s, Compulab x86_64 goldmont, etc. The problem occurs on (some) embedded devices running a non-distro config; running sarcharts on files collected from such a device yields:

$ sarcharts data/sa09
  Get data.                                                                    
Traceback (most recent call last): %  Set data for cpu Chart.                  
  File "/home/nerdboy/src/sar-graph-artifacts/.venv/bin/sarcharts", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/nerdboy/src/sar-graph-artifacts/.venv/lib/python3.11/site-packages/sarcharts/bin/sarcharts.py", line 13, in main
    SarCharts().main()
  File "/home/nerdboy/src/sar-graph-artifacts/.venv/lib/python3.11/site-packages/sarcharts/__init__.py", line 96, in main
    chartinfo = Sadf().sar_to_chartjs(
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/nerdboy/src/sar-graph-artifacts/.venv/lib/python3.11/site-packages/sarcharts/lib/sadf.py", line 115, in sar_to_chartjs
    charts[k]['datasets'][
IndexError: list index out of range
sarnold commented 6 months ago

Verified it is possible to generate the above error on a very recent kernel (using chromebook test config based on kernel arm64_defconfig).

arm64 chromebook kevin, custom chromebook kernel (defconfig+) 6.8.0-rc5
---
# hostname;interval;timestamp;CPU;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%gnice;%idle
# hostname;interval;timestamp;proc/s;cswch/s
# hostname;interval;timestamp;pswpin/s;pswpout/s
# hostname;interval;timestamp;pgpgin/s;pgpgout/s;fault/s;majflt/s;pgfree/s;pgscank/s;pgscand/s;pgsteal/s;%vmeff
# hostname;interval;timestamp;tps;rtps;wtps;dtps;bread/s;bwrtn/s;bdscd/s
# hostname;interval;timestamp;kbmemfree;kbavail;kbmemused;%memused;kbbuffers;kbcached;kbcommit;%commit;kbactive;kbinact;kbdirty;kbanonpg;kbslab;kbkstack;kbpgtbl;kbvmused
# hostname;interval;timestamp;kbswpfree;kbswpused;%swpused;kbswpcad;%swpcad
# hostname;interval;timestamp;kbhugfree;kbhugused;%hugused;kbhugrsvd;kbhugsurp
# hostname;interval;timestamp;dentunusd;file-nr;inode-nr;pty-nr
# hostname;interval;timestamp;runq-sz;plist-sz;ldavg-1;ldavg-5;ldavg-15;blocked
# hostname;interval;timestamp;TTY;rcvin/s;xmtin/s;framerr/s;prtyerr/s;brk/s;ovrun/s
# hostname;interval;timestamp;DEV;tps;rkB/s;wkB/s;dkB/s;areq-sz;aqu-sz;await;%util
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
# hostname;interval;timestamp;IFACE;rxerr/s;txerr/s;coll/s;rxdrop/s;txdrop/s;txcarr/s;rxfram/s;rxfifo/s;txfifo/s
# hostname;interval;timestamp;call/s;retrans/s;read/s;write/s;access/s;getatt/s
# hostname;interval;timestamp;scall/s;badcall/s;packet/s;udp/s;tcp/s;hit/s;miss/s;sread/s;swrite/s;saccess/s;sgetatt/s
# hostname;interval;timestamp;totsck;tcpsck;udpsck;rawsck;ip-frag;tcp-tw
# hostname;interval;timestamp;CPU;total/s;dropd/s;squeezd/s;rx_rps/s;flw_lim/s;blg_len

The above is reproducible using the build config here.

pafernanr commented 6 months ago

Hi Steve,

Thanks for your feedback. I think this issue is related with the generated output on different sar versions/arguments. sarcharts expects that sadf generates a unique csv like output. But on the provided examples there are multiple headers and different number of fields, so that data becomes unparseable, hence that IndexError. (I might catch that exception and show an appropriate message).

Last example looks similar to use the argument "-A This is equivalent to specifying -bBdFHISvwWy -m ALL -n ALL -q ALL -r ALL -u ALL. This option also implies specifying -I ALL -P ALL unless these options are explicitly set on the command line."

$ sadf -td sa01 -- -A | grep hostname
# hostname;interval;timestamp;CPU;%usr;%nice;%sys;%iowait;%steal;%irq;%soft;%guest;%gnice;%idle
# hostname;interval;timestamp;proc/s;cswch/s
# hostname;interval;timestamp;pswpin/s;pswpout/s
# hostname;interval;timestamp;pgpgin/s;pgpgout/s;fault/s;majflt/s;pgfree/s;pgscank/s;pgscand/s;pgsteal/s;pgprom/s;pgdem/s
# hostname;interval;timestamp;tps;rtps;wtps;dtps;bread/s;bwrtn/s;bdscd/s
# hostname;interval;timestamp;kbmemfree;kbavail;kbmemused;%memused;kbbuffers;kbcached;kbcommit;%commit;kbactive;kbinact;kbdirty;kbanonpg;kbslab;kbkstack;kbpgtbl;kbvmused
# hostname;interval;timestamp;kbswpfree;kbswpused;%swpused;kbswpcad;%swpcad
# hostname;interval;timestamp;kbhugfree;kbhugused;%hugused;kbhugrsvd;kbhugsurp
# hostname;interval;timestamp;dentunusd;file-nr;inode-nr;pty-nr
# hostname;interval;timestamp;runq-sz;plist-sz;ldavg-1;ldavg-5;ldavg-15;blocked
# hostname;interval;timestamp;DEV;tps;rkB/s;wkB/s;dkB/s;areq-sz;aqu-sz;await;%util
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
# hostname;interval;timestamp;IFACE;rxerr/s;txerr/s;coll/s;rxdrop/s;txdrop/s;txcarr/s;rxfram/s;rxfifo/s;txfifo/s
# hostname;interval;timestamp;call/s;retrans/s;read/s;write/s;access/s;getatt/s
# hostname;interval;timestamp;scall/s;badcall/s;packet/s;udp/s;tcp/s;hit/s;miss/s;sread/s;swrite/s;saccess/s;sgetatt/s
# hostname;interval;timestamp;totsck;tcpsck;udpsck;rawsck;ip-frag;tcp-tw
# hostname;interval;timestamp;CPU;total/s;dropd/s;squeezd/s;rx_rps/s;flw_lim/s;blg_len

A posible solution could be to let sarcharts to use a configuration file and specify different output for different devices. E.g: param -n ALL also fails, create an alternative config file ~/.sarcharts-chromebook and put the desired network activities:

...
"netdevices": {
            "arg": "-n DEV"
},
"netip": {
            "arg": "-n IP"
},
"nettcp": {
            "arg": "-n TCP"
}
...

sarcharts could then be executed using "sarcharts sa01 -c ~/.sarcharts-chromebook"

This is just an idea, allow me some time to think about the best approach. as you probably know there are multiple sar data file versions and parse all of them transparently doesn't look easy at first look. Any suggestion is welcomed :) I would also appreciate if you could be so kind to share some different failing sa?? files so I can test with different scenarios/solutions.

Regards

pafernanr commented 6 months ago

I found that sadf provides a -j argument to produce a json output. It is easily parseable and could help to parse multiple activities. Could you confirm if that argument is also available in sadf command for all your devices?

sadf -j sa01 -- -n ALL
sarnold commented 6 months ago

Based on my recent experiments the json output might be a workaround? but those same observations show the man page (for sar anyway) claims some things that aren't true. I've tested with both sar and sa1 collection and the latter is the only way to get close to the sar man page claim that "only cpu data is collected by default". Sadly I haven't narrowed it more than "distro" config vs kernel defconfig but I need to get some other work done too... Anyway, I did push some sample data to github https://github.com/sarnold/sar-graph-artifacts but it's not on the main branch yet.

sarnold commented 6 months ago

I just pushed some minimal pass/fail data sets from arm64 (collected with sa1) along with the headers. For the minimal sa1 case I can only see complete activities are different but I don't see any field differences?

sarnold commented 6 months ago

Also I stopped using the version on the yocto build branches and bumped to the current version in portage, which is not the very latest but all my recent tests are using 12.6.2 except kevin (where jammy has 12.5.2).

sarnold commented 6 months ago
your fields in the paging headers are slightly different, yet it does not help my understanding of why kevin fails but thinkpad works. Back in your court... (sorry, when I say thinkpad I mean [this](https://wiki.gentoo.org/wiki/Lenovo_ThinkPad_X13s)) I assume your example headers come from x86_64? which kernel/sysstat version is it?
pafernanr commented 6 months ago

There is a new Commit on devel branch that seems to fix this issue, it uses the json output that provides more details about Host and Activities. Please note it still needs some additional work to deal with the new collected details and provide the right links on the output html pages. I tested it with the files you shared at https://github.com/sarnold/sar-graph-artifacts "sa03 sa09 sa11 sa12" and it generated the charts without issues.

pafernanr commented 6 months ago

New release v0.2.0 published. It sould solve most compatibility issues.

sarnold commented 6 months ago

Sorry, that ^^ means I tested on all my data sets with no errors/failures. Thanks!

pafernanr commented 6 months ago

Great! :) Thank you too