google / autofdo

AutoFDO
https://groups.google.com/forum/#!forum/autofdo
Apache License 2.0
526 stars 111 forks source link

Completely broken, will not compile on Clear Linux or other distros I've tried #177

Open GabeAl opened 1 year ago

GabeAl commented 1 year ago

This code is messy. Very, unbelievably messy.

I don't understand why GCC references it in its documentation, when GCC itself compiles brilliantly across dozens of very different environments I've tried it on, and this doesn't compile anywhere for me...

GCC mentions the need to use one little utility function: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fauto-profile ("Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC.")

I don't know why they ever merged support of this less-than-research grade codebase, and why it persists into GCC13.

Why did you merge this into GCC in a partial frankenstenian way? Why not merge the ability to support the perf output directly? Nobody I've ever talked to whose heard of AutoFDO has gotten this hot mess to compile.

GCC 13.1. No LLVM. I am not using the LLVM compiler, and am only interested in using auto-profile with GCC. I just need that one utility, "create_gcov", and nothing else in this 1.7GB code morass that gets pulled in from git clone. (Why so huge and messy?).

garbage.log

Here's the log showing the extent of the failure. It's not pretty.

erozenfeld commented 1 year ago

@rlavaee @shenhanc78 The sync to internal version last week broke create_gcov build. It also overridden the changes that were made in this repo, e.g., https://github.com/google/autofdo/pull/172, https://github.com/google/autofdo/pull/156 and probably others. Can this be fixed? Why wasn't a proper merge done?

erozenfeld commented 1 year ago

@GabAI If you checkout the commit before the last one (e.g., https://github.com/google/autofdo/commit/1f9133a659eb483cd9596bb61ab156b06b48973a) you will be able to build create_gcov.

GabeAl commented 1 year ago

No gravy:

git reset --hard 1f9133a
rm -r build; mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=. ../  
-- The C compiler identification is GNU 13.2.1
-- The CXX compiler identification is GNU 13.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/gcc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX17
-- Performing Test ABSL_INTERNAL_AT_LEAST_CXX17 - Success
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Error at CMakeLists.txt:9 (add_subdirectory):
  The source directory

    /home/gabe/build/autofdo/third_party/glog

  does not contain a CMakeLists.txt file.
Call Stack (most recent call first):
  CMakeLists.txt:596 (config_without_llvm)

-- Configuring incomplete, errors occurred!
GabeAl commented 1 year ago

(Yes I also tried with git checkout --recurse-submodules 1f9133a) -- no dice.

erozenfeld commented 1 year ago

Ok, please use the previous commit (-2 from HEAD). That one definitely builds.

GabeAl commented 1 year ago

Welp, same issue so I figured out it was a problem with glog being removed and git not wanting to recurse if it was removed since HEAD.

Fixed with git submodule init explicitly at the old branch followed by git submodule update --recursive

Have you considered submitting just the converter to gcc themselves to integrate into gcc to "natively" read the prof file? Would eliminate a lot of this extraneous maintenance and compilation and dependencies etc.

GabeAl commented 1 year ago

Not sure if due to failed build, but that one is crashing with segfault on a simple profile from a simple program:

~/build/autofdo/build/create_gcov --binary myProg --profile perf.data --gcov profile.afdo
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1320] Skipping 1676 bytes of metadata: HEADER_CPU_TOPOLOGY
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event PERF_RECORD_ID_INDEX
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event PERF_RECORD_CPU_MAP
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_17
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_18
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1067] Skipping unsupported event UNKNOWN_EVENT_82
[INFO:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1058] Number of events stored: 89489
[INFO:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:274] Parser processed: 47 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT events, 89408 SAMPLE events, 89408 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped
WARNING: Logging before InitGoogleLogging() is written to STDERR
F20231010 20:54:03.864609 193833 dwarf2reader.cc:835] Unhandled form type
*** Check failure stack trace: ***
Aborted (core dumped)

Should I open a new ticket or is this just a build failure?

erozenfeld commented 1 year ago

@GabeAl If you can share myProg and perf.data, I can debug the failure you are seeing.

erozenfeld commented 1 year ago

Have you considered submitting just the converter to gcc themselves to integrate into gcc to "natively" read the prof file? >> Would eliminate a lot of this extraneous maintenance and compilation and dependencies etc.

Yes, separating the stuff that's needed for create_gcov and moving it to gcc repo would be good and it has been suggested in the past. It's a matter of finding a contributor who has the time to do the work for that.

GabeAl commented 1 year ago

Sure I can share -- but first, anything amiss with what I'm using to invoke things?

AMD Zen 4 processor (so no intel-locked "br_inst_retired:near_taken" garbledigook). I used a bunch of the standard options instead.

# compile a program to test
CFLAGS="-g -O3 -feliminate-unused-debug-types -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -Wformat -Wformat-security -m64 -fasynchronous-unwind-tables -Wp,-D_REENTRANT -ftree-loop-distribute-patterns -Wl,-z,now -Wl,-z,relro -fno-semantic-interposition -ffat-lto-objects -fno-trapping-math -Wl,-sort-common -Wl,--enable-new-dtags -mrelax-cmpxchg-loop -fopenmp -funroll-loops -march=znver4 -flto" ./configure && make -j 24

# profile the program (params stored in env variable, they provide inputs and outputs)
rm perf.data; perf record -e branches,branch-misses,cycles,instructions -b -o perf.data -- src/prog ${params}

create_gcov --binary=src/prog --profile=perf.data \
    --gcov=profile.afdo
[WARNING:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_serializer.cc:604] Ignoring branch stack entry reserved bits: 32
# … repeat the above a million times or so... can we shut this off? Doesn’t make sense to report the same warning every time it loops.
[INFO:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_reader.cc:1058] Number of events stored: 346316
[INFO:/home/gabe/build/autofdo/third_party/perf_data_converter/src/quipper/perf_parser.cc:274] Parser processed: 7 MMAP/MMAP2 events, 2 COMM events, 0 FORK events, 1 EXIT events, 345116 SAMPLE events, 342373 of these were mapped, 0 SAMPLE events with a data address, 0 of these were mapped
WARNING: Logging before InitGoogleLogging() is written to STDERR
F20231011 17:29:01.407420 65767 dwarf2reader.cc:835] Unhandled form type
*** Check failure stack trace: ***
Aborted (core dumped)

If everything looks good, I have a large ~200MB perf.data to upload.

It's a matter of finding a contributor who has the time to do the work for that.

I see -- it would probably save time in the long term with all the debugging and code cleaning and separate tree maintenence, so perhaps the argument could be made re: amortized time savings.

erozenfeld commented 1 year ago

Your steps look fine. You are probably running into a bug in DWARF5 support. Try using -gdwarf-4 instead of -g. Even if that works, I'd like to take a look at the repro.

erozenfeld commented 1 year ago

One more thing: please add -gcov_version=2 to your create_gcov invocation.

algr commented 1 year ago

I'm building from 61b25e4 (HEAD -2 at the time of writing) but am running into compile failures. E.g. quipper/huge_page_deducer.cc is missing an include for . Are there compatibility issues with the third-party libraries? Will reverting to a given autofdo commit (like 61b25e4) also pick up matching commits of the third-party libraries?

I wanted to revert to a stable release or a "last known good" release, but it looks like the latest release (and tag) was 0.19 in 2019.

shenhanc78 commented 1 year ago

Hi, sorry for breaking create_gcov.

To preclude such incidences, we are thinking of forking off create_gcov or making it into another create_llvm_prof branch. Because create_gcov is only in maintenance mode and it does not depends on llvm, whereas the main create_llvm_prof part is actively developed inside google and will be synced here from time to time.

I'll talk to @rlavaee who is syncing some internal code here.

I'll also looking into @algr building issue during the weekend.

GabeAl commented 1 year ago

gdwarf-4 and --gcov_version=2 still crash. F20231012 20:29:38.413218 88587 dwarf2reader.cc:835] Unhandled form type Check failure stack trace: Aborted (core dumped)

prog_and_perfdata.zip

GabeAl commented 1 year ago

BTW why all the llvm business? GCC is generally a more performant compiler for most vectorized hpc code with intrinsics etc (as long as you avoid corner cases or coding specifically to llvm's idiosyncracies). Seems like some redundant effort in the open source compiler community.

rlavaee commented 1 year ago

I am sorry about the breakage as well. We'll soon decide what to do about create_gcov. The easiest solution would be forking from an earlier version of the repo and then making a standalone repo for create_gcov.

As you may know, Google has switched to the LLVM compiler for several years now. We just weren't fully aware that there are gcc users out there.

On Thu, Oct 12, 2023 at 5:36 PM Gabriel Al-Ghalith @.***> wrote:

BTW why all the llvm business? GCC is generally a more performant compiler for hpc code (as long as you avoid corner cases or coding specifically to llvm's idiosyncracies). Seems like wasted effort in the open source space.

— Reply to this email directly, view it on GitHub https://github.com/google/autofdo/issues/177#issuecomment-1760589119, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHP3N34CSL5JCD5U6IBUIDX7CEIRANCNFSM6AAAAAA53DBH3A . You are receiving this because you were mentioned.Message ID: @.***>

rlavaee commented 1 year ago

I reverted the culprit patch. Please resync to head (7a005a55) and rebuild.

On Thu, Oct 12, 2023 at 8:03 PM Rahman Lavaee @.***> wrote:

I am sorry about the breakage as well. We'll soon decide what to do about create_gcov. The easiest solution would be forking from an earlier version of the repo and then making a standalone repo for create_gcov.

As you may know, Google has switched to the LLVM compiler for several years now. We just weren't fully aware that there are gcc users out there.

On Thu, Oct 12, 2023 at 5:36 PM Gabriel Al-Ghalith < @.***> wrote:

BTW why all the llvm business? GCC is generally a more performant compiler for hpc code (as long as you avoid corner cases or coding specifically to llvm's idiosyncracies). Seems like wasted effort in the open source space.

— Reply to this email directly, view it on GitHub https://github.com/google/autofdo/issues/177#issuecomment-1760589119, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHP3N34CSL5JCD5U6IBUIDX7CEIRANCNFSM6AAAAAA53DBH3A . You are receiving this because you were mentioned.Message ID: @.***>

GabeAl commented 1 year ago

Thanks @rlavaee -- would this change be expected to fix compilation, or to fix the bug I posted previously in the thread (repro requested by @erozenfeld )?

Happy to give it a try but would like to align expectations first.

rlavaee commented 1 year ago

Yes. I have tried this with gcc (Debian 13.2.0-4) 13.2.0 and all gcc-based builds and tests succeed. Please give it a try.

On Fri, Oct 13, 2023 at 1:06 PM Gabriel Al-Ghalith @.***> wrote:

Thanks @rlavaee https://github.com/rlavaee -- would this change be expected to fix compilation, or to fix the bug I posted previously in the thread (repro requested by @erozenfeld https://github.com/erozenfeld )?

Happy to give it a try but would like to align expectations first.

— Reply to this email directly, view it on GitHub https://github.com/google/autofdo/issues/177#issuecomment-1762138522, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHP3N5GGT3ZGFWHOUKYMQ3X7GNM5ANCNFSM6AAAAAA53DBH3A . You are receiving this because you were mentioned.Message ID: @.***>

erozenfeld commented 1 year ago

@rlavaee Thank you for fixing the break!

erozenfeld commented 1 year ago

@GabeAl I reproduced your failure and, as I suspected, you ran into a missing DWARF5 feature: DW_FORM_line_strp. Luckily there is a PR that fixes it: https://github.com/google/autofdo/pull/166 I asked the maintainers to review and merge it. I verified that after applying that PR create_gcov succeeds on your repro.

BTW, double check your steps when you compile prog with -gdwarf-4. If you really get dwarf2reader.cc:835] Unhandled form type with that, please send me your prog compiled with -gdwarf-4