performancecopilot / pcp

Performance Co-Pilot
https://pcp.io
Other
974 stars 237 forks source link

pmlogcheck taking more time in pass3 check #2082

Closed orasagar closed 1 month ago

orasagar commented 2 months ago

There is a performance issue observed in pmlogchecker where it has spent more than 24 hours to check on 24 hour compressed archive data .

To observe the performance changes in pmlogcheck, I have created several hour-long extracted archives from a 24-hour archive using pmlogextract.

As following are my observation :-

time pmlogcheck 20240825_2hr_data.0 real 2m28.350s user 2m27.789s sys 0m0.406s

time pmlogcheck 20240825_4hr_data.0 real 10m57.905s user 10m39.889s sys 0m0.930s

time pmlogcheck 20240825_8hr_data.0 real 57m51.486s user 57m42.385s sys 0m2.622s

It has been observed that although the data interval for extraction has doubled, the time taken by pmlogchecker does not increase proportionally. During its operation, it consumes an entire CPU core, often reaching 100% utilization.

I have checked the pstack of the pmlogcheck process most of the time its' in pass3 phase

natoscott commented 2 months ago

@orasagar you may obtain more precise time/location information using "perf top" - this should show where exactly within pass3 we're using all that CPU time.

kmcdonell commented 1 month ago

@orasagar OK, I can see a O(N*N) loop in pass3() ... let me fix that. When that's done, are you able to run a rebuilt binary against the offending archives?

orasagar commented 1 month ago

@kmcdonell yeah would love to do that .

kmcdonell commented 1 month ago

@orasagar What version of PCP do you have there, and what's the base RH (I assume) distro, and can I assume x86_64 architecture?

I can make a statically linked executable (to dodge libpcp versions issues), but that will be exposed to library versions for things I can't control, as in ...

$ ldd pmlogcheck.static 
    linux-vdso.so.1 (0x00007ffebb424000)
    libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f48ccae1000)
    libssl.so.3 => /lib/x86_64-linux-gnu/libssl.so.3 (0x00007f48cca3d000)
    libcrypto.so.3 => /lib/x86_64-linux-gnu/libcrypto.so.3 (0x00007f48cc5f9000)
    libsasl2.so.2 => /lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f48cc5de000)
    liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f48cc5b3000)
    libsystemd.so.0 => /lib/x86_64-linux-gnu/libsystemd.so.0 (0x00007f48cc4ea000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f48cc2c1000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f48cccb1000)
    libzstd.so.1 => /lib/x86_64-linux-gnu/libzstd.so.1 (0x00007f48cc1f2000)
    liblz4.so.1 => /lib/x86_64-linux-gnu/liblz4.so.1 (0x00007f48cc1d2000)
    libcap.so.2 => /lib/x86_64-linux-gnu/libcap.so.2 (0x00007f48cc1c7000)
    libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f48cc089000)
    libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f48cc061000)

so I'd rather build in an environment that matches yours in terms of DSOs.

Also, removing the O(N*N) part of pass3() has just exposed (repeated) lzma on-the-fly decompression as the next most likely CPU hog.

In my local experiments, decompressing the data (.0.xz, .1.xz, ...) first made a huge difference to the pmlogcheck running time ... I really don't want to do that in pmlogcheck as the default because we may run out of disk space in whatever filesystem I try to decompress to, and it seriously complicates the logic in pmlogcheck.

If you experiments also show prior decompression produces a net faster time, I'd consider adding command line args and logic changes to allow that as an option.

orasagar commented 1 month ago

@kmcdonell i'm using the latest latest upstream code of pcp ,yes it is x86_64 arch

$ ldd /usr/local/bin/pmlogcheck linux-vdso.so.1 (0x00007fffe69b8000) libpcp.so.3 => /usr/local/lib/libpcp.so.3 (0x00007f07e731b000) libm.so.6 => /lib64/libm.so.6 (0x00007f07e6f99000) libc.so.6 => /lib64/libc.so.6 (0x00007f07e6bc3000) libssl.so.1.1 => /lib64/libssl.so.1.1 (0x00007f07e692f000) libcrypto.so.1.1 => /lib64/libcrypto.so.1.1 (0x00007f07e6444000) libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f07e6226000) liblzma.so.5 => /lib64/liblzma.so.5 (0x00007f07e5fff000) libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007f07e5cb3000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f07e5a93000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f07e588f000) /lib64/ld-linux-x86-64.so.2 (0x00007f07e75da000) libz.so.1 => /lib64/libz.so.1 (0x00007f07e5677000) libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f07e545f000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f07e5236000) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f07e4fe1000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f07e4cf6000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f07e4adf000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f07e48db000) librt.so.1 => /lib64/librt.so.1 (0x00007f07e46d3000) liblz4.so.1 => /lib64/liblz4.so.1 (0x00007f07e44b6000) libcap.so.2 => /lib64/libcap.so.2 (0x00007f07e42ae000) libmount.so.1 => /lib64/libmount.so.1 (0x00007f07e4054000) libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f07e3d36000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f07e3b1d000) libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f07e390c000) libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f07e3707000) libblkid.so.1 => /lib64/libblkid.so.1 (0x00007f07e34b4000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f07e32ac000) libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f07e3081000) libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007f07e2e60000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x00007f07e2bdc000)

Yes it is indeed true if we help pmlogcheck with decompressing the file it a lot faster. All the timing i have shared are on the decompressed archives .

orasagar commented 1 month ago

I do testing on Oracle Linux 8 which is based on RHEL8

kmcdonell commented 1 month ago

OK @orasagar ... if you have access to current PCP source, commit 441b804de is the first part of the speed up. I'll send you via email an executable built for RHEL which may work. I'm working on part 2 which will allow one-shot decompression; but since you're already using decompressed archives that won't help your timings. If it is still unacceptably slow, I'll need to dig deeper with more profiling, and I may need a pmlogredact'd copy of one or more of your archives if I cannot reproduce the slowness you're seeing.

orasagar commented 1 month ago

@kmcdonell great work .With the above fix i see significant improvement in pmlogcheck

Following are the test result :-

$ time pmlogcheck 20240825_2hr_data.0 real 0m7.440s user 0m7.101s sys 0m0.338s

$ time pmlogcheck 20240825_4hr_data.0 real 0m15.404s user 0m14.613s sys 0m0.786s

$ time pmlogcheck 20240825_8hr_data.0 real 0m31.216s user 0m29.655s sys 0m1.551s

kmcdonell commented 1 month ago

Excellent news @orasagar.

A little bit of analysis based on elapsed time (your tests are 100% CPU bound, so this is OK) ... 20240825_2hr_data.0 7.44 / (2*60+28.35) = 0.050 => 20 times faster

20240825_4hr_data.0 15.404 / (10*60+57.905) = 0.023 => 43 times faster and 15.404 / 7.44 = 2.07 x time for 20240825_2hr_data.0

20240825_8hr_data.0 31.216 / (57*60+51.486) = 0.008 => 110 times faster and 31.216/7.44 = 4.20 x time for 20240825_2hr_data.0

So not only better the bigger the archive, but (more importantly) the time is close to proportional to the size of the archive (assuming 8hr_data ~= 4 x 2hr_data).

I'll leave it to you to close the issue if you think it is done.

orasagar commented 1 month ago

@kmcdonell thanks for the help here.I'm happy with the result and do agree with your observations .Closing out this issue.