seladb / PcapPlusPlus

PcapPlusPlus is a multiplatform C++ library for capturing, parsing and crafting of network packets. It is designed to be efficient, powerful and easy to use. It provides C++ wrappers for the most popular packet processing engines such as libpcap, Npcap, WinPcap, DPDK, AF_XDP and PF_RING.
https://pcapplusplus.github.io/
The Unlicense
2.68k stars 654 forks source link

introducing clang-format #1248

Closed tigercosmos closed 1 month ago

tigercosmos commented 10 months ago

Formatting applied to:

seladb commented 10 months ago

We can add clang-format if we can create a custom format that doesn't change the entire code base 🤯

clang-format supports 6 predefined styles: LLVM, Google, Chromium, Mozilla, WebKit, Microsoft.

I think I went over them before and each one will create tons of changes compared to the style we currently have...

egecetin commented 10 months ago

I think custom clang-format will be great, manually formatting can be problematic and both reviewers and contributors miss some minor things. A simple example,

image

Mixed use of tab and spaces (4 tabs + spaces equivalent for another 4 tabs).

Another example is from my PRs which I noticed yesterday, the alignment of public private keyword. Some PRs indent them with spaces but most of the code base not indent them. Probably both of us miss them during merge @seladb

seladb commented 10 months ago

@egecetin I totally agree! Unfortunately I know there are many formatting issues in the current code base 😕

@egecetin @tigercosmos would you consider working on a custom clang-format style?

tigercosmos commented 10 months ago

I will find some time to work on this. Let's see how we can customize the style.

clementperon commented 10 months ago

@seladb what's the issue to have one big MR? If the script is runned by clang-format by several contributors and we agree it doesn't change the code for me it's fine no?

My experience is that maintaining a custom styles is really painfull and sometimes not properly supported by Clang... And for each rule people could agree/disagree...

tigercosmos commented 10 months ago

@clementperon
The intent behind incorporating clang-format is to standardize the coding style.

At present, the CI system does not verify the code format, and relying on manual inspection is not consistently reliable. Additionally, with the integration of a code formatter, individuals can effortlessly adhere to the official standard by using IDEs.

While acknowledging that a substantial merge request may ensue, I contend that the advantages outweigh the disadvantages.

clementperon commented 10 months ago

Sorry if I was not clear, but I'm 100% in favor to have a clang-format checker in the CI.

I'm not convince to have a custom clang-format coding style.

IMO I would stick with the LLVM default coding style

tigercosmos commented 10 months ago

I'm unsure if we're referring to the same "custom style." In this context, the term "custom" style doesn't involve creating new rules; rather, I mean selecting options from the predefined choices offered by clang-format.

clementperon commented 10 months ago

Yes but some rules are working properly but others have sometimes "cross-effect" between them. It will also be more sensitive beetwen updates and could sometimes breaks.

IMO only predefined styles could be considered as "stable", so if there is one predefined styles that match our needs I will prefer to stuck to it as much as possible.

seladb commented 10 months ago

@clementperon @tigercosmos the main issue I have is that any predefined clang-format style (LLVM, Google, Chromium, Mozilla, WebKit, Microsoft) will cause changes in almost every line in every file, so:

If we have a custom style that is very close to what we currently have, there should be a lot fewer changes and we can enforce this style in the future

tigercosmos commented 10 months ago

The PR will be impossible to review - how can we ensure no logic was changed?

We use scripts to generate code, and we also run the tests, there should be no issues.

This makes git history (or "git blame") less effective

I don't think this is a strong support for stopping us, since people can easily check the history before the format.

If we have a custom style that is very close to what we currently have, there should be a lot fewer changes and we can enforce this style in the future

I agree with @clementperon, following the predefined styles is simpler for unifying the code. Not to mention that the current codebase includes many different coding styles from different people. Yes, a big MR is inevitable, but later on, life becomes easy.

egecetin commented 10 months ago

Since the changes are made by automated scripts, just a quick glance will be enough I think. I'm not sure it will break any code. And unit tests, fuzzers and also compilation itself eliminates a lot of risks. The only problem is "git blame" but it is not a big deal

clementperon commented 10 months ago

I also think that the checksum of the file generated will be identical.

CMake should not even trig a recompilation (maybe not)

seladb commented 10 months ago

I also think that the checksum of the file generated will be identical.

CMake should not even trig a recompilation (maybe not)

If this is true then it's a good way to test the migration script!

tigercosmos commented 6 months ago

@seladb @clementperon @egecetin I really want to have this feature. Here is my plan:

  1. Choose a default clang-format setting to utilize. As per @clementperon's suggestion, opting for the default setting is both safe and reliable.
  2. Firstly, execute clang-format on all code and verify the checksum of the compiled binary (which ideally should remain unchanged). As we refrain from altering the test code at this stage, we can verify the code's correctness. (Perhaps, there's no need for comparing the checksum.)
  3. Secondly, apply clang-format to all test code. Given that the non-test code should already be correct, any errors arising from the test code will become apparent.
  4. We can apply the same method for python codes using flake8
clementperon commented 6 months ago

@tigercosmos, just to be more clear on my opinion. As we don't have a strong opinion on the coding style, let's enforce with a properly define one and properly supported by clang-format.

I'm personally used to the default LLVM one but I think PcapPlusPlus coding style looks like Microsoft coding style. So if we all agree we can choose the Microsoft one.

The checksum should be indeed identical if you remove debug symbols, SHA1 and date version info of the build. I can help on that if you need.

tigercosmos commented 6 months ago

I think PcapPlusPlus coding style looks like Microsoft coding style.

I agree to use "Microsoft", after playing around with LLVM, Google, Webkit, etc, I think it's the most similar to PCPP.

The checksum should be indeed identical if you remove debug symbols, SHA1 and date version info of the build. I can help on that if you need.

@clementperon Sure, if you can help with this, it will be great.

seladb commented 6 months ago

@clementperon @tigercosmos do you have an example of how the Microsoft format looks like?

egecetin commented 6 months ago

@seladb Check this site https://zed0.co.uk/clang-format-configurator/

seladb commented 6 months ago

@seladb Check this site https://zed0.co.uk/clang-format-configurator/

Nice!! It does look quite similar to the current formatting

tigercosmos commented 6 months ago

Seems we all agree with using "Microsoft" for clang-format. That's great.

Dimi1010 commented 2 months ago

Btw, can we get a todo task list on this issue, with which parts are done and which are not?

tigercosmos commented 2 months ago

Btw, can we get a todo task list on this issue, with which parts are done and which are not?

Sounds good. Feel free to modify the issue, or I will do it later.

Dimi1010 commented 2 months ago

Btw, can we get a todo task list on this issue, with which parts are done and which are not?

Sounds good. Feel free to modify the issue, or I will do it later.

Added one. I think only Common is formatted right? I checked only that due to the closed PR reference on this issue.

tigercosmos commented 2 months ago

@Dimi1010 thanks, I added more information.

tigercosmos commented 1 month ago

Everything is completed except for the '3rdParty' directory, which we'll leave unchanged for now. Thanks, @seladb and @Dimi1010!