llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.58k stars 11.81k forks source link

Freestanding profiling APIs do not support IR-based instrumentation #58598

Open justincady opened 1 year ago

justincady commented 1 year ago

The profiling runtime APIs for freestanding environments do not consider value profiling data. Specifically, after a bit of investigation, I believe that:

Here is a reproducer, adapted from compiler-rt/test/profile/Inputs/instrprof-value-prof-real.c:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int __llvm_profile_runtime = 0;
uint64_t __llvm_profile_get_size_for_buffer(void);
int __llvm_profile_write_buffer(char *);

#define DEF_FUNC(x)                                                            \
  void x() {}
#define DEF_2_FUNCS(x) DEF_FUNC(x##_1) DEF_FUNC(x##_2)
#define DEF_4_FUNCS(x) DEF_2_FUNCS(x##_1) DEF_2_FUNCS(x##_2)
#define DEF_8_FUNCS(x) DEF_4_FUNCS(x##_1) DEF_4_FUNCS(x##_2)
#define DEF_16_FUNCS(x) DEF_8_FUNCS(x##_1) DEF_8_FUNCS(x##_2)
#define DEF_32_FUNCS(x) DEF_16_FUNCS(x##_1) DEF_16_FUNCS(x##_2)
#define DEF_64_FUNCS(x) DEF_32_FUNCS(x##_1) DEF_32_FUNCS(x##_2)
#define DEF_128_FUNCS(x) DEF_64_FUNCS(x##_1) DEF_64_FUNCS(x##_2)
#define DEF_256_FUNCS(x) DEF_128_FUNCS(x##_1) DEF_128_FUNCS(x##_2)
#define DEF_512_FUNCS(x) DEF_256_FUNCS(x##_1) DEF_256_FUNCS(x##_2)

#define FUNC_ADDR(x) &x,
#define FUNC_2_ADDRS(x) FUNC_ADDR(x##_1) FUNC_ADDR(x##_2)
#define FUNC_4_ADDRS(x) FUNC_2_ADDRS(x##_1) FUNC_2_ADDRS(x##_2)
#define FUNC_8_ADDRS(x) FUNC_4_ADDRS(x##_1) FUNC_4_ADDRS(x##_2)
#define FUNC_16_ADDRS(x) FUNC_8_ADDRS(x##_1) FUNC_8_ADDRS(x##_2)
#define FUNC_32_ADDRS(x) FUNC_16_ADDRS(x##_1) FUNC_16_ADDRS(x##_2)
#define FUNC_64_ADDRS(x) FUNC_32_ADDRS(x##_1) FUNC_32_ADDRS(x##_2)
#define FUNC_128_ADDRS(x) FUNC_64_ADDRS(x##_1) FUNC_64_ADDRS(x##_2)
#define FUNC_256_ADDRS(x) FUNC_128_ADDRS(x##_1) FUNC_128_ADDRS(x##_2)
#define FUNC_512_ADDRS(x) FUNC_256_ADDRS(x##_1) FUNC_256_ADDRS(x##_2)

DEF_512_FUNCS(foo)
void *CalleeAddrs[] = {FUNC_512_ADDRS(foo)};

typedef void (*FPT)(void);

FPT getFunc(int I) { return CalleeAddrs[I]; }

int dumpBuffer(const char *FileN, const char *Buffer, uint64_t Size) {
  FILE *File = fopen(FileN, "w");
  if (!File)
    return 1;
  if (fwrite(Buffer, 1, Size, File) != Size)
    return 1;
  return fclose(File);
}

#define MaxSize 50000
int main(int argc, const char *argv[]) {
  static __attribute__((aligned(sizeof(uint64_t)))) char Buffer[MaxSize];

  uint64_t Size = __llvm_profile_get_size_for_buffer();
  if (Size > MaxSize)
    return 1;

  int I;
  for (I = 0; I < 512; I++) {
    FPT Fp = getFunc(I);
    int J;
    for (J = 0; J < 1000 - I; J++)
      Fp();

    Fp = getFunc(511 - I);
    for (J = 0; J < 2000 - I; J++)
      Fp();
  }

  if (__llvm_profile_write_buffer(Buffer))
    return 1;

  return dumpBuffer(argv[1], Buffer, Size);
}

Using front-end instrumentation with the buffer-based APIs works correctly:

$ clang -fuse-ld=lld -fprofile-instr-generate instrprof-value-prof-real.c
$ ./a.out out.profraw
$ llvm-profdata show out.profraw
Instrumentation level: Front-end
Total functions: 515
Maximum function count: 2489
Maximum internal block count: 893184
$

But IR instrumentation does not:

$ clang -fuse-ld=lld -fprofile-generate instrprof-value-prof-real.c
$ ./a.out out.profraw
$ llvm-profdata show out.profraw
error: out.profraw: truncated profile data
$

I don't know if the solution is a second set of APIs that do incorporate the additional data correctly or another approach would be preferred.

justincady commented 1 year ago

cc'ing some recent editors of InstrProfilingBuffer.c: @ellishg @petrhosek

Also, I am willing to work on/test/review this, but I will need some guidance. It is unclear to me where the authoritative layout of this data lives in code, and whether or not using the closure-based interface (VPDataReaderType) is required.

ellishg commented 1 year ago

I'm guessing you are following these docs? These APIs seem to be only supported for Clang-based instrumentation (using -fprofile-instr-generate). I'm not super familiar with those bits so I could be wrong.

I came across this which seems to suggest value profiling isn't supported in this mode. Could you try -mllvm -disable-vp=true and see if that works?

That being said, I do think it would be nice to support dumping raw profiles to a buffer with LLVM instrumentation. It might take some changes to the API to fully support value profiling, debug info correlation, and any other future profiling modes. I would be willing to collaborate on this work.

justincady commented 1 year ago

I'm guessing you are following these docs?

Yes, exactly.

These APIs seem to be only supported for Clang-based instrumentation (using -fprofile-instr-generate). I'm not super familiar with those bits so I could be wrong.

I'm even less familiar, but based on what I found so far I believe you are right. My filing this is more of a feature request than a defect. :)

I came across this which seems to suggest value profiling isn't supported in this mode. Could you try -mllvm -disable-vp=true and see if that works?

Thanks for finding that thread. And:

$ clang -fuse-ld=lld -mllvm -disable-vp=true -fprofile-generate instrprof-value-prof-real.c
$ ./a.out out.profraw
$ llvm-profdata show out.profraw
Instrumentation level: IR  entry_first = 0
Total functions: 515
Maximum function count: 381184
Maximum internal block count: 893184
$

Wow. Ok, this issue is mistitled. The existing APIs do support IR-based instrumentation, but they do not support value profiling. Does that seem accurate?

This experiment tracks with the closest thing I found to someone implementing the request: this proposed patch for the Linux kernel. The patch never landed; it was rejected upstream. But, a significant portion of the patch is traversing the value profiling data (without actually using compiler-rt, which naturally would create maintenance issues). I didn't realize these two concepts (IR-based instrumentation, value profiling) could be separated.

petrhosek commented 1 year ago

That's correct, value profiling is a separate feature that's enabled by default for the IR-based instrumentation but can be disabled. The buffer interface doesn't support value profiling as far as I'm aware hence the issue you're seeing. I don't think there's any fundamental reason why it couldn't be supported, someone would just need to put in the effort. I'd love to see this happen and would be happy to help. @vns-mn and @xur-llvm might have additional context.

david-xl commented 1 year ago

Value profiling is not enabled by default for front-end instrumentation, that is why it works without any option.

IIRC, the reason is that the buffer size for value profile data can not be determined a priori. Happy to review patches for this.