google / autofdo

AutoFDO
https://groups.google.com/forum/#!forum/autofdo
Apache License 2.0
516 stars 109 forks source link

Cannot generate gcov data with `create_gcov` #196

Open sergiodj opened 2 months ago

sergiodj commented 2 months ago

Hi,

I've been profiling QEMU and would like to use autofdo to generate a gcov file to test some PGO scenarios. I'm currently in Ubuntu Noble 24.04 with autofdo 0.19-3 installed, and after generating a sizeable profile (around 800 Mb) with the following command:

sudo perf record \
     -e br_inst_retired.near_taken \
     --branch-any \
     -p "${qemupid}" \
     --all-cpus \
     -F 99 \
     --output="${perfoutput}" \
     --call-graph fp &

I'm now attempting to convert it to gcov by invoking:

sudo create_gcov -profiler perf -profile perf-hello-1.data -gcov perf-hello-1.gcov -gcov_version 2 -binary /usr/bin/qemu-system-riscv64 -use_lbr 0

What I get is the following:

E20240628 22:18:15.648012 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648159 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648185 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648203 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648221 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648238 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648255 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648276 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648315 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648331 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648344 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648360 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648375 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648392 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648408 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648427 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648443 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648459 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648473 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648485 382826 perf_serializer.cc:234] Unknown event type: 18                                                                                                                                                                              
E20240628 22:18:15.648496 382826 perf_serializer.cc:234] Unknown event type: 17                                                                                                                                                                              
E20240628 22:18:15.648509 382826 perf_serializer.cc:234] Unknown event type: 18
E20240628 22:18:15.648521 382826 perf_serializer.cc:234] Unknown event type: 17
E20240628 22:18:15.648538 382826 perf_serializer.cc:234] Unknown event type: 18
E20240628 22:18:15.648556 382826 perf_serializer.cc:234] Unknown event type: 17
E20240628 22:18:15.648572 382826 perf_serializer.cc:234] Unknown event type: 18
E20240628 22:18:15.648589 382826 perf_serializer.cc:234] Unknown event type: 17
E20240628 22:18:15.648607 382826 perf_serializer.cc:234] Unknown event type: 18
E20240628 22:18:15.648624 382826 perf_serializer.cc:234] Unknown event type: 17
E20240628 22:18:15.648641 382826 perf_serializer.cc:234] Unknown event type: 18
E20240628 22:18:15.648658 382826 perf_serializer.cc:234] Unknown event type: 17
E20240628 22:18:15.648674 382826 perf_serializer.cc:234] Unknown event type: 18
E20240628 22:18:15.648692 382826 perf_serializer.cc:234] Unknown event type: 17
E20240628 22:18:15.649129 382826 perf_serializer.cc:247] Unknown event type: 82
E20240628 22:18:28.603147 382826 perf_parser.cc:248] Unknown event type: 17

After that, I see perf-hello-1.gcov with only 1.6Kb, and nothing useful in it. dump_gcov isn't helpful, as it doesn't display anything.

My QEMU binary isn't stripped and still contains all debug symbols.

Any tips on how to proceed here?

Thanks.

sergiodj commented 2 months ago

FWIW, I'm using GCC 13.2.0.

snehasish commented 2 months ago

What is the host system that you are profiling on? v0.19 is depends on an older version of perf_data_converter. I suspect the platform isn't supported by it. Can you try the newer v0.20 release?

enr0n commented 1 month ago

I am hitting the same problem. I have tried building v0.20 from source on Ubuntu 24.10 (development series) but I get the same problem. For a simple reproducer, I have tried to just follow the AutoFDO section of this tutorial, except that I use perf record directly rather than the ocperf.py tool mentioned.

sergiodj commented 1 month ago

@snehasish Thanks for the reply, and apologies for the delay. Let me explain what I did so far and maybe you can help me with some other problems I'm facing.

After your reply, I worked on upgrading the autofdo package on Debian/Ubuntu to v0.20. It took me a while because some of the dependencies changed and had to be bundled into the package, but eventually I was able to get it working. FWIW, I have a repository on salsa and a PPA in Ubuntu where you can find the updated package. @enr0n is using the package from this PPA, btw.

Now, my use case is profiling an emulated riscv64 QEMU on top of an amd64 machine performing a package build. My initial attempt was to perf record QEMU and build the hello package inside the VM, which then generated a 780MB perf data file. Using create_gcov on the perf file gave me a valid GCOV file (with around 700KB), which I could then use to rebuild QEMU while enabling PGO.

After having verified that this initial test worked, I decided to go for a more realistic use case. I picked gdb as the next package I wanted to build inside the VM, and profiled QEMU again. This time, perf record generated a 30+ GB file. When I tried to process it with create_gcov, the process was OOM killed. The host machine I was using for this test had 32GB of RAM, so I decided to go for a bigger machine, and was able to get access to a 128-core, 512GB RAM Intel Xeon. Running the same profile with perf record there now gave a 50GB file, but I thought that would be OK because of the amount of RAM I had. I was wrong: surprisingly, create_gcov was OOM killed again.

Long story short, I could not make create_gcov work for any profiling scenario. It was either OOM killed, or generated a useless 1.6KB GCOV file.

Here's how I'm running perf record:

perf record \
-e br_inst_retired.near_taken \
--branch-any \
-p QEMU_PID \
--all-cpus
-F 99 \
--call-graph fp

I tried many variations of this command, including removing everything except -e br_inst_retired.near_taken and --branch-any, but nothing worked.

Here's how I'm running create_gcov:

create_gcov -profiler perf -profile perf-openssl-1.perfdata -gcov perf-openssl-1.gcov -gcov_version 2 -binary /usr/bin/qemu-system-riscv64

The QEMU binary I'm using is unstripped and has debuginfo.

Any help is appreciated. Thanks.

sergiodj commented 1 month ago

I am hitting the same problem. I have tried building v0.20 from source on Ubuntu 24.10 (development series) but I get the same problem. For a simple reproducer, I have tried to just follow the AutoFDO section of this tutorial, except that I use perf record directly rather than the ocperf.py tool mentioned.

FWIW, I did try using ocperf.py but it gave me the same results.

snehasish commented 1 month ago

I don't have direct experience with using create_gcov for such large inputs , @erozenfeld may be more helpful here.

That being said, I think the perf data converter is the biggest contributor to increased memory usage. One way to mitigate this is to decrease the rate of sampling. Your current invocation collects 99 samples every second for the entire execution (building gdb inside QEMU). Perhaps you can sample in bursts and merge the profiles?

For example, try

perf record \
-e br_inst_retired.near_taken \
--branch-any \
-p QEMU_PID \
--all-cpus
-c 100003 \
-- sleep 10

The most impactful change is to limit sampling to 10s. Sampling branches every "prime number" also usually yields better results. You can do this collection step a few times while VM is busy with your workload. Then you can use the profile_merger binary to merge the individual .afdo profiles generated for each collection. Let me know if that works. I'm curious to see what speedup you observe for qemu.

sergiodj commented 1 month ago

Thanks for the quick reply, @snehasish .

Interesting, I hadn't thought about doing the collection in bursts. In fact, I didn't know that it was possible to specify a PID and a different executable to perf record.

What's a frequency you think is best to collect these samples? Maybe one run (i.e., 10 seconds) every 10 minutes, for example? These builds tend to take hours to complete, so I'd like to have a good number of samples collected while at the same time not having a giant perf file at the end.

snehasish commented 1 month ago

possible to specify a PID and a different executable to perf record.

I looked at Brendan Gregg's one-liners to verify before responding: https://www.brendangregg.com/perf.html#OneLiners. That seems to confirm that it should work.

What's a frequency you think is best to collect these samples?

This really depends on the workload and whether there are distinct phases. For qemu emulation, I would suspect that there aren't distinguishable phases when running a build (just executing emulated instructions and the disk drivers?). So one gap I can think of is network emulation. When running a build as a workload to profile, you won't exercise any of the network code in qemu.

Maybe one run (i.e., 10 seconds) every 10 minutes, for example?

Sounds like a reasonable place to start - a combined afdo file with a couple of minutes worth of coverage should be good enough.

sergiodj commented 1 month ago

possible to specify a PID and a different executable to perf record.

I looked at Brendan Gregg's one-liners to verify before responding: https://www.brendangregg.com/perf.html#OneLiners. That seems to confirm that it should work.

Perfect, thank you.

What's a frequency you think is best to collect these samples?

This really depends on the workload and whether there are distinct phases. For qemu emulation, I would suspect that there aren't distinguishable phases when running a build (just executing emulated instructions and the disk drivers?). So one gap I can think of is network emulation. When running a build as a workload to profile, you won't exercise any of the network code in qemu.

There's actually some networking involved because the first step is to download the build dependencies, so I believe we're OK in that front, which is great.

Maybe one run (i.e., 10 seconds) every 10 minutes, for example?

Sounds like a reasonable place to start - a combined afdo file with a couple of minutes worth of coverage should be good enough.

Great. I'm starting with 10 seconds of capture every 10 minutes, then.

After I have all the perf data available, I'll have to learn how to use profile_merger. I'm assuming the order of operations is: perf data -> create_gcov (for each one) -> profile_merger.

snehasish commented 1 month ago

I'm assuming the order of operations is: perf data -> create_gcov (for each one) -> profile_merger.

Yes. Can you also confirm that the host system you are using is an Intel machine which supports LBR? I asked about the host system earlier and a later comment referenced an Intel Xeon but I wanted to confirm.

sergiodj commented 1 month ago

Yeah, /proc/cpuinfo has arch_lbr.

sergiodj commented 1 month ago

Here's the output I get with create_gcov when processing the first perf data generated in the build:

$ create_gcov -profiler perf -profile perf-openssl-1.perfdata -gcov perf-openssl-1.gcov -gcov_version 2 -binary /usr/bin/qemu-system-riscv64
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1322] Skipping 1036 bytes of metadata: HEADER_CPU_TOPOLOGY
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_EVENT_UPDATE
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1069] Skipping unsupported event UNKNOWN_EVENT_82
[INFO:./third_party/perf_data_converter/src/quipper/perf_reader.cc:1060] Number of events stored: 47482
[INFO:./third_party/perf_data_converter/src/quipper/perf_parser.cc:272] Parser processed: 330 MMAP/MMAP2 events, 26 COMM events, 0 FORK events, 0 EXIT events, 46871 SAMPLE events, 46871 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20240723 20:23:39.241485 783580 symbol_map.cc:477] Adding loadable exec segment: offset=457000 vaddr=457000
I20240723 20:23:39.241858 783580 dwarf2reader.cc:395] Only full compilation unit and skeleton unit types are supported.
W20240723 20:23:39.241865 783580 legacy_addr2line.cc:153] File '/usr/bin/qemu-system-riscv64' has mangled .debug_info section.
W20240723 20:23:39.293187 783580 profile.cc:156] use_lbr was enabled but range_count_map was empty!

After that, the gcov file created has 1.6KB and seems useless. So I believe there's something else at play here.

My qemu-system-riscv64 binary is:

$ file /usr/bin/qemu-system-riscv64 
/usr/bin/qemu-system-riscv64: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9b775b7e3e2282b1e36c609fc6625acd54e559d1, for GNU/Linux 3.2.0, with debug_info, not stripped
snehasish commented 1 month ago

I think this is the same issue as https://github.com/google/autofdo/issues/195. " File '/usr/bin/qemu-system-riscv64' has mangled .debug_info section." is a little concerning. Looking at the code I think the root cause may be that the debug info reader is outdated and does not understand the version of dwarf emitted by the compiler. Perhaps try an older version of dwarf?

From https://gcc.gnu.org/gcc-11/changes.html

To make GCC 11 generate an older DWARF version use -g together with -gdwarf-2, -gdwarf-3 or -gdwarf-4

@erozenfeld Have you come across this before?

erozenfeld commented 1 month ago

dwarf 5 support was added in https://github.com/google/autofdo/pull/142 sergiodj If you can share the perf data file and the binary file you are passing to create_gcov I can try to debug this.

sergiodj commented 1 month ago

@snehasish Thanks for the pointers.

@erozenfeld Sure thing. You can find the perf data + the QEMU binary here:

http://people.ubuntu.com/~sergiodj/sergiodj-qemu-profiling-data.tar.gz

Thanks a lot.

sergiodj commented 1 month ago

Hi @erozenfeld, just checking in to see if you were able to find anything interesting. Thanks!

erozenfeld commented 1 month ago

@sergiodj Sorry about the delay. One issue is that the tool chokes on DW_UT_partial compilation unit but that probably is just an omission. I added it to CompilationUnit::ReadHeader but then ran into an unrecognized DW_FORM_GNU_ref_alt form. It looks like it's emitted by dwz tool that optimizes DWARF. Can you confirm that your binary was processed by dwz?

sergiodj commented 1 month ago

Thanks for the feedback, @erozenfeld.

Yes, the binary was processed by dwz. When I was building QEMU I noticed that I had to prevent strip from running, but I kept the other build steps intact and one of them invokes dwz. Should I also disable it?

erozenfeld commented 1 month ago

Yes, please disable it and share the binary. I'll see how far create_gcov can get with it.

sergiodj commented 1 month ago

I'm about to finish my day now, but I will try disabling it tomorrow and see how it goes. I'll keep you posted. Thanks.

erozenfeld commented 1 month ago

If you want to test it yourself, make this change and rebuild the tools:

diff --git a/util/symbolize/dwarf2reader.cc b/util/symbolize/dwarf2reader.cc
index fb82d36..a07bec5 100644
--- a/util/symbolize/dwarf2reader.cc
+++ b/util/symbolize/dwarf2reader.cc
@@ -389,6 +389,7 @@ void CompilationUnit::ReadHeader() {
       }
       header_.unit_type = reader_->ReadOneByte(headerptr);
       if (header_.unit_type != DwarfUnitType::DW_UT_compile &&
+          header_.unit_type != DwarfUnitType::DW_UT_partial &&
           header_.unit_type != DwarfUnitType::DW_UT_skeleton &&
           header_.unit_type != DwarfUnitType::DW_UT_split_compile)
       {