idaholab / moose

Multiphysics Object Oriented Simulation Environment
https://www.mooseframework.org
GNU Lesser General Public License v2.1
1.73k stars 1.04k forks source link

Code formatter for MOOSE #7289

Closed bwspenc closed 7 years ago

bwspenc commented 8 years ago

Description of the enhancement or error report

We spend a lot of time in the process of committing and reviewing code on code formatting. If we could find something automatic that would work for us, that would save us a lot of time in the code review process.

Rationale for the enhancement or information for reproducing the error

Manual checking and fixing formatting is time consuming and error prone.

Identified impact

We'd need to find a formatter that works for us and integrate it into our testing process. If it could be run as part of the pre-check phase, reviewers would never need to point out formatting issues again.

bwspenc commented 8 years ago

It looks like clang-format might be a good option since it is shipped with clang, so most of us already have it installed. I'm playing with it now, and it looks like it might be a reasonable option. I'll have to see whether it has options for all of our rules. You customize the formatting with .clang-format files that live in the code directory. Does anyone have any experience with other alternatives?

dschwen commented 8 years ago

Yeah, we've had that idea. I have looked into clang-format extensively, but it lacks the customizability to format to our style guide.

andrsd commented 8 years ago

I tried astyle. I was able to trick it into doing what I wanted, but a few important features were missing, namely indenting the initializer list by 2 indents (default is 1 in astyle and cannot be changed). The worst part about astyle is that it does not see the language grammar so it is a bunch of if-then statements for C, C++ and other C-like languages. Because of that, it is unusable for our needs ;-( (I think I ran into some corner cases that astyle did not pick up at all)

friedmud commented 8 years ago

Yep - I'm all for this. I hate that our PRs are mostly about formatting and trivial junk... we shouldn't even be talking about that stuff... we should be looking at the algorithms and design...

But we just haven't found a way to do it yet...

I suppose this will end up as another python side project...

andrsd commented 8 years ago

I suppose this will end up as another python side project...

Or a change of code standards ;-) </end-of-trolling>

dschwen commented 8 years ago

I wouldn't call that trolling. It would certainly be the path of least resistance.

andrsd commented 8 years ago

Yeah, but code standards are almost like religions...

friedmud commented 8 years ago

And the only correct religion is mine 👿

Seriously though: I don't want to change our code standards... especially for a reason like this (what happens when astyle changes their default?).

Can we make a small (to begin with) Python script that uses Clang to at least do some format checking automatically? It doesn't have to be perfect... but we can at least capture 90% of the stuff we see day-to-day.

Just now I was reviewing #7292 and they had multiple if statements that look like:

if (junk) stuff = 4;

It HAS to be easy to recognize that the body of a branch is on the same line as the branch and error out... has to be....

andrsd commented 8 years ago

It HAS to be easy to recognize that the body of a branch is on the same line as the branch and error out... has to be....

It is and astyle can do it. So does clang-format via AllowShortBlocksOnASingleLine: false.

I might find some time today to try clang-format and see how close we can get. I will need to build 3.9, I am still on 3.7...

friedmud commented 8 years ago

Yeah - if we can at least turn on some of the checks... it would help a ton.

Derek

On Thu, Jul 7, 2016 at 10:45 AM David Andrs notifications@github.com wrote:

It HAS to be easy to recognize that the body of a branch is on the same line as the branch and error out... has to be....

It is and astyle can do it. So does clang-format via AllowShortBlocksOnASingleLine: false.

I might find some time today to try clang-format and see how close we can get. I will need to build 3.9, I am still on 3.7...

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7289#issuecomment-231099760, or mute the thread https://github.com/notifications/unsubscribe/AA1JMfH4Xf24MKQiOxuEbNf9dTf4m5Sqks5qTREZgaJpZM4JGe1o .

dschwen commented 8 years ago

With clang-format you cannot do checks and you cannot just apply some rules.

fdkong commented 8 years ago

Just checking is not enough, but hopefully we have something that could help us automatically reformat the code. I really hate code standards because different packages have different ones. For example, the standard in PETSc is so different from that in MOOSE. Now I have to deal with multiple standards.

friedmud commented 8 years ago

Automatic checking is better than manual checking!

It would be nice to reformat automatically (would still have to be done manually before committing - or as a fixup commit... the CI system would never do it automatically)... but first we need to be able to check it!

As for dealing with multiple standards. Make sure you set up Emacs so that it detects which project you're in and automatically applies the correct standard... that helps a TON.

Derek

On Thu, Jul 7, 2016 at 11:20 AM Fande Kong notifications@github.com wrote:

Just checking is not enough, but hopefully we have something that could help us automatically reformat the code. I really hate code standards because different packages have different ones. For example, the standard in PETSc is so different from that in MOOSE. Now I have to deal with multiple standards.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7289#issuecomment-231110923, or mute the thread https://github.com/notifications/unsubscribe/AA1JMbvXbXkys4Lntpvi5b2zMafOBp6Jks5qTRlPgaJpZM4JGe1o .

andrsd commented 8 years ago

You can run the formatter with desirable options on and redirect the output into another file, then do diff on the old and formatted. Then use that as a comment from moosetest saying: Fix this in your code ;-)

dschwen commented 8 years ago

Uhm, yeah, except that you cannot pick and choose the reformatting that clang-format applies, so your diff would pretty much never be empty.

andrsd commented 8 years ago

I think I tried this with astyle, let me double check...

andrsd commented 8 years ago

So from the astyle doco:

...with no options at all. This will use the default bracket style, 4 spaces per indent, and no formatting changes...

So we could activate just one rule. But, unfortunately, astyle will reformat our 4 space indents in class initializer list ;-( So, not a path forward....

friedmud commented 8 years ago

Maybe we could just come up with a few simple regexes for the worst offenders....

On Thu, Jul 7, 2016 at 1:14 PM David Andrs notifications@github.com wrote:

So from the astyle doco http://astyle.sourceforge.net/astyle.html:

...with no options at all. This will use the default bracket style, 4 spaces per indent, and no formatting changes...

So we could activate just one rule. But, unfortunately, astyle will reformat our 4 space indents in class initializer list ;-( So, not a path forward....

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7289#issuecomment-231145371, or mute the thread https://github.com/notifications/unsubscribe/AA1JMbXhtw4vs0ohVCI_Leaah1RC5iMlks5qTTPbgaJpZM4JGe1o .

dschwen commented 8 years ago

Parsing C++ with regexes, what could possibly go wrong? ;-)

friedmud commented 8 years ago

We don't really need to "parse" it. We just need to guard against a few things... we're already doing it with if, for, while, cout, cerr, etc...

On Thu, Jul 7, 2016 at 1:23 PM Daniel Schwen notifications@github.com wrote:

Parsing C++ with regexes, what could possibly go wrong? ;-)

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7289#issuecomment-231148000, or mute the thread https://github.com/notifications/unsubscribe/AA1JMWXV2VvkyajbDiT7kEktvhPhfD3gks5qTTYggaJpZM4JGe1o .

bwspenc commented 8 years ago

As has already been said, you can't pick and choose what clang-format and astyle do -- they reformat all of your code the way they wants to. They can't exactly match our current formatting, but it seems like we could come up with settings that are reasonably close to our style that we could live with. clang-format is used by some pretty high-profile projects and has built-in settings for them, so think that if we could come up with a format based on a set of their current available options, it would likely keep supporting that format in the future. The worst thing that would happen if the tool's behavior changed is that we'd need to reformat a lot of our code with a tool that makes that really easy.

If clang-format and astyle aren't the answer, surely there's got to be a tool out there that meets our needs. We could start making some extensions to the existing tools to do more checking, but that's something that could easily become a huge time sink. I think it's hard to justify that effort just to keep our current formatting standards. I know these things get religious, but to me, having a consistent style is far more important than what that style is (unless it's just plain awful).

garvct commented 8 years ago

As you probably already know google has a detailed c++ coding style guidelines https://google.github.io/styleguide/cppguide.html they also have a python script called cpplint.py to check if c++ code obeys the guidelines (https://github.com/google/styleguide/tree/gh-pages/cpplint). Clearly google c++ and moose c++ style guidelines are different, but it may not be too difficult modifying their cpplint.py script to adapt it to the moose c++ coding guidelines. Currently the cpplint.py script just identifies deviations from the coding guidelines and does not automatically generate the correct code. I have made a few minor modification to this script to get it to semi-work with for moose coding style (For those interested see https://github.com/garvct/moose.git (scripts/moose_cpplint.py). I have not spent a lot of time on this, but it would need more work and testing before considering it for any production work. Just another option to add to the discussion.

dschwen commented 8 years ago

How does it fare with our current code base?

permcody commented 8 years ago

Tagging @rwcarlsen since he was chatting with me about this issue today as well.

dschwen commented 8 years ago

I checked it out on framework kernels. Not bad. Needs to be a bit stricter with operator spacing (to enforce arithmetic operators - but careful about unaries). It has some weird erros that don't seem to make sense (like [build/endif_comment] [5] where there is absolutely nothing after the #endif).

garvct commented 8 years ago

Yes, I agree it is definitely not production ready and not yet fully compliant with the Moose coding standard. The question in my mind is, is this approach worth pursuing? Does it have potential? Is a different approach preferable?

YaqiWang commented 8 years ago

I think this is great. We used @andrsd 's patched astyle to format some of our codes. It works well but we have to tolerate few inconsistency. I also like the idea of integrate this tool as a checker in the Precheck recipe. As mentioned by @friedmud we can then be focused on the real stuff during review.

aeslaughter commented 8 years ago

Python: https://www.pylint.org/ C++: http://clang.llvm.org/docs/ClangFormat.html

friedmud commented 8 years ago

I like Clang Format... it has quite a lot of customizable options: http://clang.llvm.org/docs/ClangFormatStyleOptions.html

Surely we can cook up something that does 90% of what we want... and then we can take a look at the last 10% and make some decisions and hopefully end up with something that passes through Clang Format perfectly.

permcody commented 8 years ago

I'm fairly certain that we've looked at Clang format multiple times and it doesn't do what we need it to do. David's patched astyle is probably closer but still not 100%. I really don't understand why this is so difficult, especially when you have the clang AST sitting there.

aeslaughter commented 8 years ago

I think the clang-format has improved quite a bit over time, it might be worth a try.

On Wed, Oct 12, 2016 at 8:15 AM, Cody Permann notifications@github.com wrote:

I'm fairly certain that we've looked at Clang format multiple times and it doesn't do what we need it to do. David's patched astyle is probably closer but still not 100%. I really don't understand why this is so difficult, especially when you have the clang AST sitting there.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_idaholab_moose_issues_7289-23issuecomment-2D253225600&d=CwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=h7heP8xwI1i_HikChvhFbEBurKirgfOCdwgBxB9lM8c&m=FG6rKS58ubafWh9E1O9nThDi4qGCxyPq05jZikoh1CI&s=aGw0-09eqz4ryjHoq4FC4TDQV0SEloLzs9Fe_WpHTnU&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAU1VS72IkB1PHJw3oAcDw2Jdhzk5UYJks5qzOuSgaJpZM4JGe1o&d=CwMFaQ&c=54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00&r=h7heP8xwI1i_HikChvhFbEBurKirgfOCdwgBxB9lM8c&m=FG6rKS58ubafWh9E1O9nThDi4qGCxyPq05jZikoh1CI&s=mCxxWuuvY8RHKr3q8yB0YFG4ztPCQNomK9Jz1alZ_uw&e= .

rwcarlsen commented 7 years ago

I would like to cast my vote in favor of a few small modifications to our style guide to facilitate vanilla clang-format usage. The biggest discrepancy I've noticed is the initializer list formatting - for longer lists, clang always puts a line break before the colon:

class Foo {
  Foo()
    : _param1(0),
      _param2(0),
      ...

It also does reflowing of long lines (that can be turned off) - that if we use does things like:

a = (myreallylongvariablenamethatshouldbeshorter + 
        areallylongfunctionname(anotherlongvariablename, 
                                anothervariable));

If we could make a few adjustments to the style guide for things like this, then we would be golden.

friedmud commented 7 years ago

I think these are fine. I wouldn't want to force reflowing at a low limit though... is the limit adjustable? On Thu, Nov 3, 2016 at 5:44 PM Robert Carlsen notifications@github.com wrote:

I would like to cast my vote in favor of a few small modifications to our style guide to facilitate vanilla clang-format usage. The biggest discrepancy I've noticed is the initializer list formatting - for longer lists, clang always puts a line break before the colon:

class Foo { Foo() : _param1(0), _param2(0), ...

It also does reflowing of long lines (that can be turned off) - that if we use does things like:

a = (myreallylongvariablenamethatshouldbeshorter + areallylongfunctionname(anotherlongvariablename, anothervariable));

If we could make a few adjustments to the style guide for things like this, then we would be golden.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/idaholab/moose/issues/7289#issuecomment-258283354, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1JMTs0WuY5PZHmLc-LvGQrR1j-LlfEks5q6lWjgaJpZM4JGe1o .

rwcarlsen commented 7 years ago

The limit is adjustable - and so are penalties for breaking different kinds of statements/structures.