hedronvision / bazel-compile-commands-extractor

Goal: Enable awesome tooling for Bazel users of the C language family.
Other
689 stars 114 forks source link

Unwrap emcc wrapper script to its underlying clang invocation #154

Closed kon72 closed 10 months ago

kon72 commented 10 months ago

This change unwraps emcc.sh to its underlying clang invocation. This is done by running emcc.sh with an environment variable EM_COMPILER_WRAPPER set to a fake wrapper script that just prints the command line arguments passed to it and exits.

Diff of compile_commands.json before and after this change

Before this change: ```json { "file": "a.cc", "arguments": [ "external/emsdk/emscripten_toolchain/emcc.bat", "-xc++", "-isysrootexternal/emscripten_bin_win/emscripten/cache/sysroot", "-fdiagnostics-color", "-fno-exceptions", "-fno-strict-aliasing", "-funsigned-char", "-no-canonical-prefixes", "-std=gnu++17", "-nostdinc", "-nostdinc++", "-fomit-frame-pointer", "-O0", "-Wall", "-DBAZEL_CURRENT_REPOSITORY=\"\"", "-iquote", ".", "-iquote", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin", "-iquote", "external/emsdk", "-iquote", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/external/emsdk", "-iwithsysroot/include/c++/v1", "-iwithsysroot/include/compat", "-iwithsysroot/include", "-isystem", "external/emscripten_bin_win/lib/clang/18/include", "-MD", "-MF", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.d", "-c", "a.cc", "-o", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.o", "-Wno-builtin-macro-redefined", "-D__DATE__=\"redacted\"", "-D__TIMESTAMP__=\"redacted\"", "-D__TIME__=\"redacted\"", "-Werror" ], "directory": "C:/users/kinse/proj/wasm_bazel" } ``` After this change: ```json { "file": "a.cc", "arguments": [ "external/emscripten_bin_win/bin/clang++.exe", "-xc++", "-target", "wasm32-unknown-emscripten", "-fignore-exceptions", "-fvisibility=default", "-mllvm", "-combiner-global-alias-analysis=false", "-mllvm", "-enable-emscripten-sjlj", "-mllvm", "-disable-lsr", "-DEMSCRIPTEN", "-isysrootexternal/emscripten_bin_win/emscripten/cache/sysroot", "-fdiagnostics-color", "-fno-exceptions", "-fno-strict-aliasing", "-funsigned-char", "-no-canonical-prefixes", "-std=gnu++17", "-nostdinc", "-nostdinc++", "-fomit-frame-pointer", "-O0", "-Wall", "-DBAZEL_CURRENT_REPOSITORY=\"\"", "-iquote", ".", "-iquote", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin", "-iquote", "external/emsdk", "-iquote", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/external/emsdk", "-iwithsysroot/include/c++/v1", "-iwithsysroot/include/compat", "-iwithsysroot/include", "-isystem", "external/emscripten_bin_win/lib/clang/18/include", "-MD", "-MF", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.d", "-c", "-Wno-builtin-macro-redefined", "-D__DATE__=\"redacted\"", "-D__TIMESTAMP__=\"redacted\"", "-D__TIME__=\"redacted\"", "-Werror", "a.cc", "-o", "bazel-out/wasm-fastbuild-ST-b4dbd4a2bfc4/bin/_objs/a/a.o" ], "directory": "C:/users/kinse/proj/wasm_bazel" } ```

cpsauer commented 10 months ago

Hi, Kon! Thanks for a exceptionally thoughtful contribution here and for being the type of person who leaves things better than you found them :)

This is very clever and I'm definitely down to merge. Some quick thoughts first:

Thanks again for an outstanding contribution :) Can't wait to get you on the contributors list!

Chris

kon72 commented 10 months ago

Hi Chris, thanks for the quick reply!

What do you think about printing out the args with a Bazel-built cc_binary, being passed in as data instead of the shell scripts wrapping python?

I think cc_binary is a good approach here. I originally tried to implement it with a pure shell script, but I couldn't find a reliable way to print out the command line arguments on a Windows batch file when they contain quotes. (As I'm not that familiar with .bat, I would be happy to hear if anyone knows how to do it.) Then I decided to use shell script plus python as emsdk and emscripten does, but as you pointed out, it prohibits users from using the hermetic python. For a pure py_binary approach, I'm not sure how to make print_args.py get called from emcc process. I'd like to ask you if it's possible to do it when EM_COMPILER_WRAPPER can only take one argument? Edit: I was able to make it work by simply setting _print_args_executable to py_binary rule. That said, cc_binary would be better in terms of speed.

EMCC Environment construction: If you haven't already, could I ask you to check that Bazel still isn't providing these environment variables in the aquery output?

Wow, bazel aquery --output=jsonproto "mnemonic('CppCompile',deps(<your target>))" gave me environment variables necessary to run emcc! (omitting --output won't) However, it seems this would not work prior to Bazel 6.1.0 (see https://github.com/bazelbuild/bazel/pull/17274, I tested on 6.0.0 and 6.1.0), so I'd want to implement in a way that: first try to find env vars from aquery output, and if not found, falls back to the current way of constructing those variables using get_workspace_root. Or perhaps can we drop support of that versions? I'd like to kindly ask your thoughts on it.

I think it should include PATH, at least

Unfortunately, aquery output doesn't have PATH, it only has PWD and EM_* things:

"environmentVariables": [{
  "key": "PWD",
  "value": "/proc/self/cwd"
}, {
  "key": "EM_BIN_PATH",
  "value": "external/emscripten_bin_mac_arm64"
}, {
  "key": "EM_CONFIG_PATH",
  "value": "external/emsdk/emscripten_toolchain/emscripten_config"
}, {
  "key": "EMCC_WASM_BACKEND",
  "value": "1"
}],

Could I ask you what user needs this fixes? What errors were you seeing in clangd without? (I can imagine...but it's be good to hear explicitly.)

emcc invokes clang with a bunch of extra flags such as -target wasm32-unknown-emscripten and -D__EMSCRIPTEN_SHARED_MEMORY__=1. Specifically, when clang sees -target wasm32-unknown-emscripten, it internally adds the definition of -D__EMSCRIPTEN__ which is used in lots of source code. Those flags are necessary for clangd or other toolings to analyze the source code with the same commands used for building artifacts, without manual clangd configuration of adding missing compile flags. This fixes also correctly handles the emcc specific flags like -sUSE_PTHREADS, fixing the following error messages while generating compile_commands.json: clang: error: no such file or directory: 'USE_PTHREADS=1'

cpsauer commented 10 months ago

Great work! Thank you again.

Was about to comment back that py_binary auto-wrapped in a script (based on the alert email) but you beat me to it. cc_binary seems great though. Love it.

Minor and nitpicky, but would you be down do use .cpp? (I'm partial only because .hpp is worth it, to distinguish from C, and then .cpp matches.)

Sorry about requiring the revert; and thanks for the note so folks don't edit it out in the future. Really appreciate your testing on Windows, too, and being so thoughtful about the quotes. (Curious: Do you have a VM or a separate machine? I switch to some old desktops when I need to test Windows things :)

Similarly, way to test the bazel versions and output options! We probably should support ideally, but I'm very tempted to drop and ok if you want to. To explain why I'm torn: we did get an issue filed when we required 6.1 (see backlink above) but IMO older bazel versions are buggy enough that we should really encourage updating and not contort ourselves too much. What do you think? If we do back support, might be worth calling bazel version so we can also handle that other, backlinked case.

PATH: Mine does on other actions. Hmm. Could you try adding specifying build --incompatible_strict_action_env to your .bazelrc and seeing if that adds PATH? I think that's what adds it; does for me. Loads of folks turn that on, because it avoids loads of needless cache misses from different terminals, IDEs, machines, etc. (Should be the default IMO; they've tried to make it so in the past.) I think the move is to use it if available and otherwise fall back to the environment. Thanks again for your care here.

And thanks so much for the context around the errors. More generally, I really appreciate how great you are to work with and all you've done here, Kon!

Chris

cpsauer commented 10 months ago

Reading your changes quickly and smiling :) Thanks again for being so great, Kon. Looks like we're nearly there--but I don't want to presume without hearing back from you on the above. I'll toss up some quick twiddles for your consideration

kon72 commented 10 months ago

Regarding the bazel version, I personally think it's ok to require >= 6.1.0 since our projects doesn't depend on Bazel < 6.1.0. Feel free to let me know if it needs to support an older version later on :)

Could you try adding specifying build --incompatible_strict_action_env to your .bazelrc and seeing if that adds PATH?

Thanks, I confirmed that adds PATH. I pushed a commit to use that PATH if available.

Curious: Do you have a VM or a separate machine?

I have a separate Windows machine and do some tests on it through SSH from my Mac laptop.

cpsauer commented 10 months ago

Sweet! Seems like you're ready to merge, then! (Even if you haven't said so explicitly). Let's get you on that contributor board! (If I've been too hasty, well, just holler here or put up another PR.)

Thanks again for being great to work with, producing such quality code, and leaving things undeniably much better than you found them. It was a real pleasure working with you.

Cheers! Chris

doupongzeng commented 10 months ago

Regarding the bazel version, I personally think it's ok to require >= 6.1.0 since our projects doesn't depend on Bazel < 6.1.0. Feel free to let me know if it needs to support an older version later on :)

Could you try adding specifying build --incompatible_strict_action_env to your .bazelrc and seeing if that adds PATH?

Thanks, I confirmed that adds PATH. I pushed a commit to use that PATH if available.

Curious: Do you have a VM or a separate machine?

I have a separate Windows machine and do some tests on it through SSH from my Mac laptop.

Hello, Kon, after seeing your update, it's quite impressive. Would you consider making it backwards compatible? In order to ensure smoother usability in the future, I am currently using Bazel 5.1, and have encountered some version-related issues as indicated here. --host_features=compiler_param_file need bazel >= 6.1.0

kon72 commented 10 months ago

Thank you so much for your kind words! I'm grateful for the opportunity to work with someone as experienced and supportive as you.

Best regards, Kon

kon72 commented 10 months ago

Hello @doupongzeng,

I appreciate your suggestion regarding backward compatibility. I'm going to file another PR that makes this change work with Bazel 5.x.