hedronvision / bazel-compile-commands-extractor

Goal: Enable awesome tooling for Bazel users of the C language family.
Other
689 stars 114 forks source link

Adds the ability to use absolute paths for bazel managed artifacts #151

Open csmulhern opened 10 months ago

csmulhern commented 10 months ago

This allows compile_commands.json commands to work for users who use e.g. --experimental_convenience_symlinks.

This also adds the ability to disable automatic workspace edits. While modifying the user’s workspace can help with the ergonomics of first time setup, it should be possible to disable these non-essential options.

cpsauer commented 10 months ago

Hey Cameron! Thanks for using the tool and for your eagerness to contribute. I really appreciate it.

Totally with you in principle, except that we're pretty pinned down by Bazel's internal structure with //external. Without it, things that reference external code (including transitively or essential bazel tools!) will break! To explain, could I redirect you over to https://github.com/hedronvision/bazel-compile-commands-extractor/issues/150#issuecomment-1863703726 from a couple days ago? If you'd like more context, there's also some in the Implementation Readme (and other issues/PRs). (And the other is to hidden and ignored files; and always newly needed for //external (structure has to be different on Windows so people really shouldn't check it in)--so I'm loathe to turn it off unless there's a user need I'm missing?)

Again, thanks so much for your efforts! If you think I'm wrong here, please say, and let's discuss further. Looking forward to continuing to work together here.

Happy coding! Chris

cpsauer commented 10 months ago

I'm just now seeing #152, trying to patch around. (Sorry, read this one first.) There are a lot more flag cases here, that we'd have to handle (across platforms) unfortunately! Would be down to take a patch that works robustly w/ high confidence, but I'm really not so sure it's worth it vs just having the workspace mirror Bazel's structure.

csmulhern commented 10 months ago

Hey Chris, thanks for the warm response!

What you're saying makes sense. I guess I intended for this to land in combination with #152, where the path rewriting makes up for the lack of symlinking //external. I probably shouldn't have split them up, but figured maybe they could make sense independently. It sounds like a better split probably would have been to do the gitignore part in this PR, and couple the link_external change with #152. Happy to make whatever organizational change you think makes sense there.

In terms of the meatier part of your question, the need is really just that it can be inconvenient to put generated files / build artifacts in the root of the project directory. While more mature editors and tools will try to work well in these environments (e.g. ripgrep will automatically exclude gitignore paths by default), there is a long tail of tools that use recursive searching / indexing that will never support even the most popular use cases. It is frustrating to have to deal with slow editors because they try to index a large bazel-out directory, or have grep search across generated files you're not interested in unless you manually --exclude them. I (and others) find it easier to just keep the project directly source-only, rather than try to modify every single tool we might use.

From a cursory search through existing issues, it looks like this has come up before, and that there is hesitancy about an approach that tries to rewrite the build commands to use absolute paths for bazel generated artifacts. I appreciate that the existing solution works well for a large number of users, and the correct solution for path rewriting could be difficult, with a long tail of edge cases. As such, I hoped maybe there would be more of an appetite if we made this behavior opt-in, and if we could improve on it iteratively. Hopefully at some point it could become good enough to supersede the existing approach, but it doesn't have to.

csmulhern commented 10 months ago

I'm just now seeing #152, trying to patch around. (Sorry, read this one first.) There are a lot more flag cases here, that we'd have to handle (across platforms) unfortunately! Would be down to take a patch that works robustly w/ high confidence, but I'm really not so sure it's worth it vs just having the workspace mirror Bazel's structure.

No worries! I just responded before seeing this message too 😅.

I can imagine there are a lot more cases that aren't covered! This isn't necessarily the area where I have the strongest domain knowledge. I'm happy to keep iterating on this to see if we can get it to a place where we do have high confidence in the path rewriting. Let me know if you have any pointers for figuring out what the space of possibilities that need to be considered here is.

cpsauer commented 10 months ago

Thanks for being great! Yeah--right there with you.

The hesitance you're detecting is a well-meaning combo of overload, low personally experienced need relative to complexity, and a lot of time spent with past attempts that were way short of handling the complexity incurred. But like, I get it, and I so appreciate the excitement to contribute. To repurpose the ripgrep example as a motivation for this feature: someone reported that the VSCode ripgrep extension ignores gitignore and chokes on the plethora of broken symlinks Bazel can churn out in some configurations (which I'm also helping work on on the Bazel side). Similar, I think, with some python plugins. On the other hand, the symlinks have been super valuable, as have external and generated file indexing, and the symlinks really do simplify the implementation. Still, though, I think this falls for me in the most awkward of buckets: I would medium-happily take a robust PR but can't spend lots more time on it. I wish it were otherwise; truly, if there weren't opportunity cost, I'd have brought some of the past PRs over the finish line.

For tracking down the flag possibilities: I think it's just a matter of carefully pouring over the flag doc pages for each of the compiler drivers (gcc, clang, msvc, nvcc (landing shortly w/ a flag difference list)), plus doing a bit of experimenting with them at the command line, since there are some undocumented (but accepted) options that are like -path. Some might be mentioned in past PRs efforts. It's gonna take a bit...and complicate adding new toolchains, but if it's worth it to you, I'm game. I just probably shouldn't merge an incremental thing, because it'll cause a lot of user pain and incremental time vs just doing it right up front, and any codebase big enough to need this is likely to hit a bunch of the cases (mine doesn't need, but is big enough to hit idk, 10 more). Same reason I went for full quality on this thing :)

If we do this, what do you think about just auto-enabling iff bazel-out is missing? That is, when the user has expressed a preference to clean up their workspace with --symlink_prefix and --experimental_convenience_symlinks and the workspace no longer mirrors the compilation environment anyway. (Right now we detect an error on that case (search "//bazel-out is missing") Seems like that might be preferable to requiring configuration. And if I get to pick on PR structure, I'd love it if the atomic unit could be all together.

Cheers and thank you. Sorry to not be just making it happen--just lots of other easy, valuable things I'm blocking that I need to not neglect. Chris

csmulhern commented 10 months ago

Thanks, Chris.

No need to apologize for lack of bandwidth; I totally understand. I'm happy to keep pushing on this and to see if we can get to the point where we're comfortable upstreaming.

For tracking down the flag possibilities: I think it's just a matter of carefully pouring over the flag doc pages for each of the compiler drivers (gcc, clang, msvc, nvcc (landing shortly w/ a flag difference list)), plus doing a bit of experimenting with them at the command line, since there are some undocumented (but accepted) options that are like -path. Some might be mentioned in past PRs efforts. It's gonna take a bit...and complicate adding new toolchains, but if it's worth it to you, I'm game. I just probably shouldn't merge an incremental thing, because it'll cause a lot of user pain and incremental time vs just doing it right up front, and any codebase big enough to need this is likely to hit a bunch of the cases (mine doesn't need, but is big enough to hit idk, 10 more). Same reason I went for full quality on this thing :)

So in the current implementation, we basically detect the easy cases where the path arguments are standalone, and the replacements are straightforward. We also deal with include paths (e.g. -Ibazel-out/...). Then there's just a fallback for all cases of the form --arg=bazel-out/.... This last one is the most hand wavy, but I think it will be better to take this approach, than to try to e.g. manually whitelist all argument names. For robustness, we could switch to a regex type approach, where we match the prefix --[A-Za-z-_]+=. Happy to move in that direction, if we're aligned on that sort of approach. I've also added comments to the source code that try to better document the approach being taken.

Regarding other cases that are missing, I took a quick glance at flatbuffers, protobuf, and abseil-cpp. flatbuffers and protobuf don't seem to use target_compatible_with properly, so I could only test a subset of targets, but all cases appeared to be dealt with succesfully. abseil-cpp worked flawlessly. I was only able to test on macOS with clang, but will check on Ubuntu with GCC in the next couple of days. Testing with Windows is going to be difficult for me.

If we do this, what do you think about just auto-enabling iff bazel-out is missing? That is, when the user has expressed a preference to clean up their workspace with --symlink_prefix and --experimental_convenience_symlinks and the workspace no longer mirrors the compilation environment anyway. (Right now we detect an error on that case (search "//bazel-out is missing") Seems like that might be preferable to requiring configuration. And if I get to pick on PR structure, I'd love it if the atomic unit could be all together.

I'm open to this. In that case, I think we could drop rewrite_bazel_paths altogether, and do something like this:

def _apply_path_replacements(compile_args: typing.List[str], bazel_info: BazelInfo):
    replacements = {}

    if not bazel_out_exists:
        replacements['bazel-out/'] = os.fspath(bazel_info.output_path) + "/",

    if not {link_external}:
        replacements['external/'] = os.fspath(bazel_info.external_path) + "/",

    return [_path_replacement_for_arg(arg, replacements) for arg in compile_args]

Let me know your thoughts.

Cheers,

Cameron

cpsauer commented 10 months ago

Likewise, and thanks Cameron. Happy holidays :) Thanks for the understanding and the enthusiasm!

Totally agree that the prefix flags are the tricky ones that need hardcoding. I don't think there's a good substitute, though, for just flipping through the flag docs and making sure you've got the cases. As a (very non-exhaustive) set of examples: (1) There are definitely lots more common important prefix cases (e.g. /I on Windows). (2) I happened to see also in bazel output that we should probably not change -frandom-seed=bazel-out/..., since it's a path being used as a seed. (3) There are also some that are --flag=path=path2, like -fdebug-prefix-map, so probably worth matching all occurrences rather than moving to the regex. Don't love the hardcoding--people can always add flags, but if you feel strongly that it's worth it...

Re auto-on: IMO we should also turn external on or off automatically in the same way. The user has already expressed their intent around symlinks and a preference for hiding.

hidden gitignore: Don't yet understand the case where people would want to configure this (or would even know it'd happened)--and should probably not expand the interface w/o compelling.

Cheers and again thank you! Apologies for shaping rather than just digging in on this one. Chris

cpsauer commented 10 months ago

In fixing another issue, I'm realizing that we'll also need to patch the paths read from .d files when we hit those bazel caches (see _parse_headers_from_makefile_deps).

cpsauer commented 10 months ago

And (on a separate PR) a good example of another prefix--really still think the solution here would be to take a pass through the compiler flag docs

-isysrootexternal/emscripten_bin_win/emscripten/cache/sysroot