intel / llvm

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.
Other
1.25k stars 738 forks source link

[SYCL] ccache is invalidated by random file names generated for offloaded code bundles #5260

Open psalz opened 2 years ago

psalz commented 2 years ago

Describe the bug

In its default "direct mode", ccache examines the preprocessor output to determine all dependencies for a given translation unit. In doing so, ccache gets tripped up by the randomly generated filenames for offloaded code bundles. The preprocessor output for a small example program will look something like this (excerpt):

// __CLANG_OFFLOAD_BUNDLE____START__ host-x86_64-unknown-linux-gnu
# 1 "/tmp/test-963ab6.cpp"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 404 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "/tmp/test-header-08e0d3.h" 1
# 2 "<built-in>" 2
# 1 "/tmp/test-963ab6.cpp" 2
# 1 "test.cpp"
int main() { return 0; }
# 1 "/tmp/test-footer-cd3679.h" 1
# 4 "test.cpp" 2

Here, the files /tmp/test-963ab6.cpp, /tmp/test-header-08e0d3.h, /tmp/test-963ab6.cpp and /tmp/test-footer-cd3679.h have randomly generated names that change with each compiler invocation, causing the cache to be invalidated every time.

I assume this is a regression as according to #1797 ccache should work with DPC++.

NOTE: This can be worked around by using ccache's "depend mode" (CCACHE_DEPEND=1), which does not parse preprocessor output to determine dependencies, relying on compiler-generated dependency files instead (-MD/-MMD).

To Reproduce

Create file test.cpp:

int main() { return 0; } 

Run DPC++ with ccache (at least twice):

CCACHE_LOGFILE="$PWD/ccache.log" ccache clang++ -fsycl test.cpp -c -o test.o

Examining the produced ccache.log file will reveal something like this:

...
[2022-01-05T11:51:30.485895 3877683] No b8b4653v0ceorrbv3ra96m7q5m0hd13he in primary storage
[2022-01-05T11:51:30.485942 3877683] Running preprocessor
[2022-01-05T11:51:30.485948 3877683] Executing clang++ -fsycl -E test.cpp
[2022-01-05T11:51:30.551676 3877683] Failed to stat /tmp/test-3f740e.cpp: No such file or directory
[2022-01-05T11:51:30.551697 3877683] Disabling direct mode
...

As you can see, not only are the random names problematic, ccache is additionally unable to access the files which presumably get deleted right away. I'm therefore wondering whether there is any advantage (for tooling) in emitting these file names into the preprocessor output in the first place, or whether they could simply be removed.

Environment:

AlexeySachkov commented 2 years ago

This is caused by our integration footer implementation: we append a piece of code at the end of user-provided input and we are doing this though creating a temporary file. If my assumption is correct, then adding -fno-sycl-use-footer should help to workaround this issue.

I'm therefore wondering whether there is any advantage (for tooling) in emitting these file names into the preprocessor output in the first place, or whether they could simply be removed.

Integration footer is used by implementation of some both core SYCL 2020 features and Intel extensions. At the moment, the list includes specialization constants, is_device_copyable and SYCL_INTEL_device_global.

Tagging @mdtoguchi here as well. I guess it is possible to generate more stable name for those temporary files, but I'm afraid there could be issues if the same files are being compiled several times at the same time - temporary files would be the same and if compiler options are different in those compilations, that will lead to unexpected results.

psalz commented 2 years ago

I'm therefore wondering whether there is any advantage (for tooling) in emitting these file names into the preprocessor output in the first place, or whether they could simply be removed.

Integration footer is used by implementation of some both core SYCL 2020 features and Intel extensions. At the moment, the list includes specialization constants, is_device_copyable and SYCL_INTEL_device_global.

What I meant is whether there is any real use case for emitting the linemarkers for those temporary files, because that's what's tripping up ccache. AFAIK these are usually needed to properly report errors (with correct file name / line numbers), but I wonder whether there is any value in getting errors for temporary files that no longer exist once the error message is shown to the user. But of course there may be other uses for those linemarkers that I don't know of.

AlexeySachkov commented 2 years ago

What I meant is whether there is any real use case for emitting the linemarkers for those temporary files, because that's what's tripping up ccache. AFAIK these are usually needed to properly report errors (with correct file name / line numbers), but I wonder whether there is any value in getting errors for temporary files that no longer exist once the error message is shown to the user. But of course there may be other uses for those linemarkers that I don't know of.

I don't know exactly why are they needed here, but my guess is that besides diagnostic messages they help to generate the correct debug info. Tagging @premanandrao and @mdtoguchi to help here. I'm also not sure how useful it is to have the correct debug info for int-footer/int-header, because hopefully end user won't need to debug those parts of the toolchain.

@mdtoguchi, do you think that this issue can be avoided if we change the implementation of integration footer? From:

clang++ -cc1 /* emit-int-footer=footer.h */
append-file main.cpp footer.h -o tmp.cpp
clang++ -cc1 tmp.cpp

To:

clang++ -cc1 /* emit-int-footer=footer.h */
clang++ -cc1 -include main.cpp footer.h -main-file-name main.cpp
mdtoguchi commented 2 years ago

Modifying the compilation to use the footer as the main file and including the source file as an -include is do-able in the compiler driver. It is unclear to me what kind of impact this will have down the line.

premanandrao commented 2 years ago

What I meant is whether there is any real use case for emitting the linemarkers for those temporary files, because that's what's tripping up ccache. AFAIK these are usually needed to properly report errors (with correct file name / line numbers), but I wonder whether there is any value in getting errors for temporary files that no longer exist once the error message is shown to the user. But of course there may be other uses for those linemarkers that I don't know of.

I don't know exactly why are they needed here, but my guess is that besides diagnostic messages they help to generate the correct debug info. Tagging @premanandrao and @mdtoguchi to help here. I'm also not sure how useful it is to have the correct debug info for int-footer/int-header, because hopefully end user won't need to debug those parts of the toolchain.

@mdtoguchi, do you think that this issue can be avoided if we change the implementation of integration footer? From:

clang++ -cc1 /* emit-int-footer=footer.h */
append-file main.cpp footer.h -o tmp.cpp
clang++ -cc1 tmp.cpp

To:

clang++ -cc1 /* emit-int-footer=footer.h */
clang++ -cc1 -include main.cpp footer.h -main-file-name main.cpp

Tagging @bwyma for his opinion on the debug question.

ax3l commented 1 year ago

Since the issue is inactive since a year, this is just an unqualified comment that we would love to see ccache support back in action :)

Thanks for the deep dive already!

KornevNikita commented 5 months ago

Hi! There have been no updates for at least the last 60 days, though the ticket has assignee(s).

@AlexeySachkov, could I ask you to take one of the following actions? :)

Thanks!

github-actions[bot] commented 3 months ago

Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).

@AlexeySachkov, could you please take one of the following actions:

Thanks!

mdtoguchi commented 3 months ago

There is work in progress to improve this behavior. It is to use a new '-include-footer' option to alleviate the temporary file generation for the footer. https://github.com/intel/llvm/pull/14402

github-actions[bot] commented 1 month ago

Hi! There have been no updates for at least the last 60 days, though the issue has assignee(s).

@AlexeySachkov, could you please take one of the following actions:

Thanks!

psalz commented 1 month ago

There is work in progress to improve this behavior. It is to use a new '-include-footer' option to alleviate the temporary file generation for the footer. #14402

I just tested my reproducer example from above with a recent (ed2128d8678b56524) build of DPC++ and I'm getting the same behavior. Do I have to pass any additional compiler flags?

mdtoguchi commented 1 month ago

@psalz, we have discovered additional changes that are needed to address the addition of the internal header/footer files (referenced in the generated preprocessed file). This is expected to be fixed for the DPC++ 2025.1 release.