llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.03k stars 11.58k forks source link

How to support non-hierarchical .clang-format inheritance #107808

Open jediry opened 1 week ago

jediry commented 1 week ago

As discussed in my RFC earlier this year, I need a way for a .clang-format file to be based on another .clang-format file that is not located in an ancestor directory. This is because our company's repo contains many, many projects as top-level subdirectories under the repo root, and any given team (with their own, tribal style conventions) might own a dozen of these projects, and would want to maintain only one copy of their style rules.

Basically, I want the same semantics as BasedOnStyle: InheritFromParent, but with the ability to refer to an arbitrary file, not just .clang-format in the parent directory. The base file should be processed exactly as if it were a .clang-format in the parent directory, including being allowed to BasedOnStyle: some-other.clang-format.

As mentioned in the RFC discussion, there may be security implications of allowing a .clang-format for specify an arbitrary file (e.g., /etc/passwd?). While it's not clear to me exactly how such an exploit would work, I'd rather not be the guy who made it possible. So, security should be part of the discussion.

I can see three possibilities for supporting this:

  1. Only support paths relative to the current file (e.g., BasedOnStyle: ../../../../format/team1.clang-format). Simple to implement, but brittle if directories are moved around. Also might allow referencing arbitrary files outside the project root.
  2. Allow environment variable expansion (e.g., BasedOnStyle: $(FORMATDIR)/team1.clang-format). Also simple to implement, and resilient against code moves, but may have security risks (e.g.,BasedOnStyle: $(HOME)/.ssh/private_key```)
  3. Only support paths relative to a list of "style search paths", which are specified on the command-line (e.g., BasedOnStyle: team1.clang-format, with --style-search-path my_repo/format on the command-line). This is basically the same model as how C/C++ #include <someheader.h> directives are resolved. Since this externalizes the decision of where clang-format is allowed to look for files, this seems (a) most secure (since the user is in control of where clang-format can look) and (b) most cumbersome to use (since the user must somehow ensure that the style search paths are always provided when invoking clang-format).

I have a PR open that implements option 3, but the amount of plumbing is...significant. Not only does clang-format need to plumb through --style-search-path arguments, but so do clang-tidy, clangd, clang-move...anything that invokes formatting logic.

Opinions on the best way forward here?

owenca commented 1 week ago

I think we already have a clang-format option that does what you want:

  --style=<string>               - Set coding style. <string> can be:
                                   ...
                                   3. 'file:<format_file_path>' to explicitly specify
                                      the configuration file.
jediry commented 1 week ago

I think we already have a clang-format option that does what you want:

  --style=<string>               - Set coding style. <string> can be:
                                   ...
                                   3. 'file:<format_file_path>' to explicitly specify
                                      the configuration file.

I don't understand how that accomplishes my goal. For example, in the following directory hierarchy:

  • /root
  • /proj1/
  • .clang-format (BasedOnStyle: team1.clang-format
  • /proj2/
  • .clang-format (BasedOnStyle: team2.clang-format
  • /proj3/
  • .clang-format (BasedOnStyle: team2.clang-format
  • /format/
  • team1.clang-format
  • team2.clang-format

each of the project-specific .clang-format files may contain project-specific overrides (e.g., additional StatementMacros, etc.), so I need inheritance. It's not enough merely to manually specify team1.clang-format when formatting /root/proj1/..., which is what I think --style file: is limited to.

mydeveloperday commented 1 week ago

There is part of me that makes me think your teams need to get together and "commit to one style already"..

I don't feel we should make clang-format more complicated because different teams can't agree on one true style!

What we add here needs to really give the clang-format community a feature that many people can use and see benefit, not just appease team members who can't agree by creating a complex hierarchy of options (and having to resolve conflicts of which style windows)

Don't get me wrong I understand the struggle of having one senior developer running proj3 who insists on some awful whitesmith style and has been using it for 30 years and won't change, but we are layering levels of complexity on clang-format in my view for the wrong reason. (we are trying to fix the effect of your problem not the cause, which is a lack of consensus!)

I've fought this battle myself, even gave up my own style for the greater good and embraced the chosen style as my own.

I'm not sure this is "The fix you are looking for..."

/root
/proj1/
.clang-format (BasedOnStyle: team1.clang-format
/proj2/
.clang-format (BasedOnStyle: team2.clang-format
/proj3/
.clang-format (BasedOnStyle: team2.clang-format
/format/
team1.clang-format
team2.clang-format
jediry commented 1 week ago

Why have InheritFromParent, then? I'm not trying to be snarky...just pointing out that clang-format's current design already acknowledges the need for some amount of local decision making, but then insists that this must mirror the directory hierarchy, and it's that latter aspect that I'm challenging. If there really should be "one true style", then you could simplify clang-format further by getting rid of inheritance altogether and only supporting a single .clang-format file at the repo root.

I think understand your point...if we just achieved style consensus across our dozens of teams and thousands of developers, we wouldn't need more flexibility from clang-format, and we would have a codebase with a unified style. And so, perhaps clang-format not supporting the kind of flexibility I'm seeking would be a good thing, because the tool limitations would push us to achieve that consensus.

But in actual fact, this is not what's happening. Instead, as we adopt clang-format in our giant codebase, people are trying to engineer their way around clang-format's limitations. I know of at least two other competing solutions to mine, where other teams are building infrastructure around clang-format to synchronize configuration files across multiple projects. I'm trying to fix the issue within clang-format instead, so that such gymnastics become unnecessary.

Now, I'm not arguing that my PR necessarily represents the best tradeoffs...I'm unhappy with just how much plumbing was required to implement this, and I'm starting to feel that it might be best to go back to option 1 or 2 and only support paths relative to the current .clang-format file, or perhaps expanding just one specific environment variable (e.g., CLANG_FORMAT_INCDIR).