radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
19.85k stars 2.95k forks source link

Create script to apply style to diffs #11261

Closed ret2libc closed 5 years ago

ret2libc commented 5 years ago

I copied the clang-format-diff.py script from the clang project, as it provides a way to apply clang-format to diffs only. The license is University of Illinois Open Source. Is that ok?

Moreover, this patch restores the tab size to 8 char.

ret2libc commented 5 years ago

Fixes https://github.com/radare/radare2/issues/11170

Maijin commented 5 years ago

So what about existing one here:

ret2libc commented 5 years ago

@Maijin maybe I should remove that? I just copied it directly in .clang-format, since at least it will be parsed by editors plugins.

Maijin commented 5 years ago

Needs to have only one yes.

ret2libc commented 5 years ago

removed

Maijin commented 5 years ago

Don't forget to update scripts like indent.sh that uses this clang-format

ret2libc commented 5 years ago

Thanks, done.

ret2libc commented 5 years ago

I slightly modified clang-format-diff.py to handle function declaration/definitions which do not require the space before the parenthesis.

ret2libc commented 5 years ago

I removed the indent-diff.sh script and just handled everything in the clang-format-diff.py script

ret2libc commented 5 years ago

Even though @radare said clang-format doesn't work, after testing some files I think it does a pretty good job. There are some things that are not well indented (like break in a switch case), but the good thing about this script is that it works at the "patch level" so we can more easily manually check those cases. Most of the stuff will be done automatically (e.g. indentation, spaces around functions, operators, etc.)

I believe as @radare said there are cases that clang-format does not handle well, but I do think we can either adapt to what the tools expect (instead of having our own particular coding style) or just consider them from case to case, manually.

ret2libc commented 5 years ago

@radare I've put an example with clang-format in doc/indent-example.c and added some comments on some different things that apply there. I'd like your comments on that. Also, by applying clang-format on entire files I obtain reasonable results.

nsajko commented 5 years ago

Have you reported how "break;" after a block in a switch statement is formatted to the Clang project? Seems like a bug.

But on the other hand, why not include the "break;" in the block in the first place? Seems like that would be a prettier style? The change is even easy to do programmatically if you decide to use clang-format for radare2 (at least for the cases where the "break;" precedes a case keyword).

ret2libc commented 5 years ago

@nsajko I din't report the "break;" thing to Clang yet. I will double-check and do it if necessary. I also think that including the break; in the block is better anyway.

nsajko commented 5 years ago

I think you should add "IndentPPDirectives: AfterHash" to .clang_format.

ret2libc commented 5 years ago

@nsajko thanks for the help! have you seen indented #if in the current code? I'd like to introduce as less changes to the style as possible for the moment.

nsajko commented 5 years ago

I did a quick search in libr:

util/thread_sem.c util/sys.c util/regex/regexec.c util/regex/regcomp.c util/regex/regex2.h fs/fs.c include/sdb/types.h include/r_types.h debug/p/debug_native.c

etc.

ret2libc commented 5 years ago

yeah but as you can probably see most of the code doesn't do that. I guess those parts were copied from other code which used the indentation. Anyway, let's see what @radare thinks about it.

nsajko commented 5 years ago

OK

nsajko commented 5 years ago

The radare2 .clang-format should probably also have something like

ForEachMacros: ['r_list_foreach', 'ls_foreach', 'fcn_tree_foreach_intersect', 'r_skiplist_foreach', 'graph_foreach_anode']

With the other foreach macros, too.

Maijin commented 5 years ago

Maybe merge pre-commit-indent in the python script and probably indent.sh in it to have only one consolidated script instead of 3 scattered?

ret2libc commented 5 years ago

@Maijin I think I could merge indent.sh in it, but I'm not sure about pre-commit-indent. I think I will keep it separate, it's just a script that calls the .py and prints an error messag.

@nsajko thanks! That's useful, if you find anything else please share!

ret2libc commented 5 years ago

@Maijin btw, no, I think I will keep indent.sh. @radare will probably want to keep it since it does use uncrustify.

radare commented 5 years ago

thats false, export UNCRUST=0 and sys/indent.sh will use clang-format instead of uncrustify

On 30 Aug 2018, at 08:30, Riccardo Schirone notifications@github.com wrote:

@Maijin https://github.com/Maijin btw, no, I think I will keep indent.sh. @radare https://github.com/radare will probably want to keep it since it does use uncrustify.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/pull/11261#issuecomment-417204779, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lsxjIsaeu50l0exME1MXFAXphlF4ks5uV4Z1gaJpZM4WM0iH.

ret2libc commented 5 years ago

Really? It seems to me it's always set to 1, so it always use uncrustify https://github.com/radare/radare2/blob/master/sys/indent.sh#L33 .

But anyway, as I said sys/indent can be used to run uncrustify, so I'd keep it.

radare commented 5 years ago

i dont know if this license is compatible with LGPL, but as long as it is not distributed in the installation it shouln't be a problem because it's fine for source distribution

radare commented 5 years ago

sys/indent.sh was doing the same already and appliying a bunch of rules.

one of the problems i see in having this pre-commit check is that not everybody will have the same version of clang-format and related dependencies to be able to make a proper commit.

radare commented 5 years ago

just set UNCRUST=0 as env var and sys/indent.sh will use clang-format instead of uncrustify

ret2libc commented 5 years ago

i dont know if this license is compatible with LGPL, but as long as it is not distributed in the installation it shouln't be a problem because it's fine for source distribution.

I see. I'm no licenses expert, but I see no reasons to include it in the installation.

sys/indent.sh was doing the same already and appliying a bunch of rules.

sys/indent.sh was doing full-file style, which is much harder to review and could introduce things we may don't want. This instead is very useful at a patch level, so that only touched lines are fixed for style. This way people don't need to do a PR with 1000000 line style changes only to fix 1 line.

one of the problems i see in having this pre-commit check is that not everybody will have the same version of clang-format and related dependencies to be able to make a proper commit.

They are not forced to use the pre-commit check. I have clang-format 5.0.2 and everything works well.

I will review the other comments soon.

radare commented 5 years ago

On 31 Aug 2018, at 08:34, Riccardo Schirone notifications@github.com wrote:

@ret2libc commented on this pull request.

In doc/indent-example.c:

+#include "r_util.h" +#include + +typedef struct r_core_rtr_host_t2 {

  • int proto;
  • char host[512];
  • int port;
  • char file[1024];
  • RSocket *fd; +} RCoreRtrHost2;
  • +static const char *help_msg_aa[] = {

  • "Usage:", "aa[0*?]", " # see also 'af' and 'afna'",
  • "aa", " ", "alias for 'af@@ sym.;af@entry0;afva'", //;.afna @@ fcn.'",
  • "aa", "", "analyze all flags starting with sym. (af @@ sym.)",
  • NULL}; As said, is this really important? What I want is one way to automatically indent/fix style in my code.

Because you dont know how to type following a coding style ?

I don't care whether there's a space before/after a given word, as long as there's a tool that automagically does that.

Thats fine.

Is there such a tool? If yes, perfect, please show me. If not, I think it's time we change a bit our coding style to adapt to those existing tools.

The idea of sys/indent.sh was this already. You are not inventing anything new. Having an automated indentation tool for all the code is not reliable option yet. And our coding style is standard enough to be supported by clang format. Uncrustify is a bit stuck and i had no time to fix the issues

Also. If we have to change or decide something about the coding style becayse the tools we use are buggy. Seems a bad idea but practical and requires discussion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 5 years ago

On 31 Aug 2018, at 08:01, Riccardo Schirone notifications@github.com wrote:

i dont know if this license is compatible with LGPL, but as long as it is not distributed in the installation it shouln't be a problem because it's fine for source distribution.

I see. I'm no licenses expert, but I see no reasons to include it in the installation.

Then why did you removed it from doc? Because doc is installed

sys/indent.sh was doing the same already and appliying a bunch of rules.

sys/indent.sh was doing full-file style, which is much harder to review and could introduce things we may don't want. This instead is very useful at a patch level, so that only touched lines are fixed for style. This way people don't need to do a PR with 1000000 line style changes only to fix 1 line.

Thats fine. If just helps us review code written by people with poor style

one of the problems i see in having this pre-commit check is that not everybody will have the same version of clang-format and related dependencies to be able to make a proper commit.

They are not forced to use the pre-commit check. I have clang-format 5.0.2 and everything works well.

Ok

I will review the other comments soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ret2libc commented 5 years ago

I see. I'm no licenses expert, but I see no reasons to include it in the installation. Then why did you removed it from doc? Because doc is installed

I moved it to the root directory, so that tools can automatically use it, because in general they just get the file from the root.

Because you dont know how to type following a coding style ?

I know how to do that, you know how to do that, and still.... many of our PRs have a missing space here and a wrong indentation there and something else in another place. So yes, but no. :)

And our coding style is standard enough to be supported by clang format.

Really? Can you show me another project that uses our own coding style? And no, your own projects do not matter unless they are used by other projects not created by you.

Also. If we have to change or decide something about the coding style becayse the tools we use are buggy. Seems a bad idea but practical and requires discussion.

Yes, I think we need to discuss and choose this ASAP. I'm tired of seeing PRs full of "space here", "newline here", "put brace on next line", etc. and I think also people who gets reviews with all these things are tired of this. We should do our best with the tool we have. Can we adapt to a style that can be automatically enforced? Or we want to keep doing the "human formatter" for another while? The only reason to have a coding style is consistency, but if we have no real way to enforce it automatically, then it doesn't really matter. Let's just choose one that works for others! Let's use a slight variation of the kernel style or something similar (again, I don't care as long as there's an almost automatic way to enforce this).

Sometimes, as you said many times, the automatic formatter may fail, yes. We are here for this. We are not forced to follow everything he says (e.g. if a very long line makes sense in one case, we just ignore the formatter). But at least we can have a good base to work on. The nice thing about this PR is that it allows contributors to fix style in patches, so when people touch a file they won't have to fix everything in one shot (which is hard to review in general) but will fix the style only in the few lines they modified.

radare commented 5 years ago

Our style is 90% ripped from glib, but using tabs instead of spaces. but again, tools dont indent well, there are many constructions that clang-format will indent wrong or ignore. im fine with merging this because i have no time or priorities for making a testsuite to proof you how broken all those tools are. but yes, its fine to go forward, in fact the whitelist of the indent.sh file is because i was defining which files were properly indented using our style and using our tooling, did you verified if those files are still properly indented with your script and last clang-format?

On 31 Aug 2018, at 14:11, Riccardo Schirone notifications@github.com wrote:

I see. I'm no licenses expert, but I see no reasons to include it in the installation. Then why did you removed it from doc? Because doc is installed

I moved it to the root directory, so that tools can automatically use it, because in general they just get the file from the root.

Because you dont know how to type following a coding style ?

I know how to do that, you know how to do that, and still.... many of our PRs have a missing space here and a wrong indentation there and something else in another place. So yes, but no. :)

And our coding style is standard enough to be supported by clang format.

Really? Can you show me another project that uses our own coding style? And no, your own projects do not matter unless they are used by other projects not created by you.

Also. If we have to change or decide something about the coding style becayse the tools we use are buggy. Seems a bad idea but practical and requires discussion.

Yes, I think we need to discuss and choose this ASAP. I'm tired of seeing PRs full of "space here", "newline here", "put brace on next line", etc. and I think also people who gets reviews with all these things are tired of this. We should do our best with the tool we have. Can we adapt to a style that can be automatically enforced? Or we want to keep doing the "human formatter" for another while? The only reason to have a coding style is consistency, but if we have no real way to enforce it automatically, then it doesn't really matter. Let's just choose one that works for others! Let's use a slight variation of the kernel style or something similar (again, I don't care as long as there's an almost automatic way to enforce this).

Sometimes, as you said many times, the automatic formatter may fail, yes. We are here for this. We are not forced to follow everything he says (e.g. if a very long line makes sense in one case, we just ignore the formatter). But at least we can have a good base to work on. The nice thing about this PR is that it allows contributors to fix style in patches, so when people touch a file they won't have to fix everything in one shot (which is hard to review in general) but will fix the style only in the few lines they modified.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/pull/11261#issuecomment-417644953, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lm6ik1EWJBKuT89qalQVlgEKqJHOks5uWSfqgaJpZM4WM0iH.

ret2libc commented 5 years ago

There are some differences (maybe you used uncrustify instead of clang?), but many of those are real fixes (e.g. missing spaces, char* var instead of char *var, etc.). One difference is how we indent following lines. For example:

        anal->cmdtail = r_str_appendf (anal->cmdtail,
-               "axc 0x%"PFMT64x " 0x%"PFMT64x "\n",
-               case_addr, switch_addr);
+                                      "axc 0x%" PFMT64x " 0x%" PFMT64x "\n",
+                                      case_addr, switch_addr);
        anal->cmdtail = r_str_appendf (anal->cmdtail,
-               "afbe 0x%"PFMT64x " 0x%"PFMT64x "\n",
-               switch_addr, case_addr);
+                                      "afbe 0x%" PFMT64x " 0x%" PFMT64x "\n",
+                                      switch_addr, case_addr);

I do find clang (the ones with +) better IMHO, because it is aligned with the open parenthesis, but let me know what you think. I can double check if clang-format is able to align with our style, though i'm not sure i know exactly how are style works in these cases. Are we supposed to indent with just one tab?

radare commented 5 years ago

The + lines are ugly as hell

I dont like them at all. You will end up having a column of 5 chars at the right of your editor to read the arguments of fucntions if you follow that style. It basically doesnt scales. So nope

On 31 Aug 2018, at 15:34, Riccardo Schirone notifications@github.com wrote:

There are some differences (maybe you used uncrustify instead of clang?), but many of those are real fixes (e.g. missing spaces, char var instead of char var, etc.). One difference is how we indent following lines. For example:

    anal->cmdtail = r_str_appendf (anal->cmdtail,
  • "axc 0x%"PFMT64x " 0x%"PFMT64x "\n",
  • case_addr, switch_addr);
  • "axc 0x%" PFMT64x " 0x%" PFMT64x "\n",
  • case_addr, switch_addr); anal->cmdtail = r_str_appendf (anal->cmdtail,
  • "afbe 0x%"PFMT64x " 0x%"PFMT64x "\n",
  • switch_addr, case_addr);
  • "afbe 0x%" PFMT64x " 0x%" PFMT64x "\n",
  • switch_addr, case_addr); I do find clang (the ones with +) better IMHO, because it is aligned with the open parenthesis, but let me know what you think. I can double check if clang-format is able to align with our style, though i'm not sure i know exactly how are style works in these cases. Are we supposed to indent with just one tab?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ret2libc commented 5 years ago

The + lines are ugly as hell I dont like them at all. You will end up having a column of 5 chars at the right of your editor to read the arguments of fucntions if you follow that style. It basically doesnt scales. So nope

Ok, I will check if it's possible to do what we do with clang-format. If it's not, I'd guess we are the only one to do that :/

radare commented 5 years ago

really? i dont think that indentation is taht strange. uncrustify supports it iirc, and it’s common in js, and imho much more readable

On 3 Sep 2018, at 10:17, Riccardo Schirone notifications@github.com wrote:

The + lines are ugly as hell I dont like them at all. You will end up having a column of 5 chars at the right of your editor to read the arguments of fucntions if you follow that style. It basically doesnt scales. So nope

Ok, I will check if it's possible to do what we do with clang-format. If it's not, I'd guess we are the only one to do that :/

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/pull/11261#issuecomment-418037808, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-ljS97KyA8Pi7pxVMMO8aoukTADBXks5uXOWhgaJpZM4WM0iH.

nsajko commented 5 years ago

@radare Can you take a look at https://clang.llvm.org/docs/ClangFormatStyleOptions.html to see what style do you like the best?

For example; the relevant option for what you were discussing in the last few comments is AlignAfterOpenBracket, and you have to decide if you like the 'DontAlign' or 'AlwaysBreak' values for that option, because you obviously do not like 'Align'. AlwaysBreak always break after an open bracket, if the parameters don’t fit on a single line.

Also, we may want 'ForContinuationAndIndentation', instead of 'Always' for UseTab.

Also, what do you think about IndentPPDirectives? I would say it makes ifdef messes much more readable.

ret2libc commented 5 years ago

Thanks @nsajko that was helpful! I'm updating the PR.

ret2libc commented 5 years ago

ping @radare since r2con is finished.

Indentation of multiline can be done as you suggested (thanks again @nsajko ). Other thoughts? Concerns?

ret2libc commented 5 years ago

@radare ping

ret2libc commented 5 years ago

Oh, btw I have still some comments in the shell scripts to address.

nsajko commented 5 years ago

About the UNIX Shell language and quoting words in it: it is a good practice to always quote (with double-quotes) words that contain a parameter substitution or any of the characters []*?. This is because unquoted parameter expansion is finicky in this language, it is a composition of two operations: splitting at IFS and file name generation (globbing), so unless you really want either of those it is good to quote just in case.

EDIT: for example, use $var if you want splitting at IFS or file name generation (globbing), otherwise (by default) use "$var".

radare commented 5 years ago

Imho a closing brace makes semse to be in the same line when its 1 line only. But if there are many it looks better like this. I dont think fixing the indentation tool to handle this shouldnt be difficult.

On 17 Sep 2018, at 21:38, Neven Sajko notifications@github.com wrote:

@nsajko commented on this pull request.

In doc/indent-example.c:

+#include "r_util.h" +#include + +typedef struct r_core_rtr_host_t2 {

  • int proto;
  • char host[512];
  • int port;
  • char file[1024];
  • RSocket *fd; +} RCoreRtrHost2;
  • +static const char *help_msg_aa[] = {

  • "Usage:", "aa[0*?]", " # see also 'af' and 'afna'",
  • "aa", " ", "alias for 'af@@ sym.;af@entry0;afva'", //;.afna @@ fcn.'",
  • "aa", "", "analyze all flags starting with sym. (af @@ sym.)",
  • NULL}; The } is probably attached because there is no , after NULL. Maybe we could programmatically insert colons at the end of lists like this to force the } to go to the next line.

On the other hand, I agree with retlibc that the attached-closing-curly-bracket-style is also OK, you would just need to get used to it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 5 years ago

Lgtm

On 17 Sep 2018, at 21:45, Neven Sajko notifications@github.com wrote:

@nsajko commented on this pull request.

In sys/indent.sh:

@@ -52,6 +53,12 @@ if [ "${IFILE}" = "-u" ]; then IFILE="$1" fi

+if [ "${IFILE}" = "-c" ]; then Why the double quoting of -c?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 5 years ago

I like with quotes. [ is an alias for test and not quoting it may look like a flag instead of argument. Its more redable tjis way

On 19 Sep 2018, at 10:51, Riccardo Schirone notifications@github.com wrote:

@ret2libc commented on this pull request.

In sys/indent.sh:

@@ -52,6 +53,12 @@ if [ "${IFILE}" = "-u" ]; then IFILE="$1" fi

+if [ "${IFILE}" = "-c" ]; then Honestly I just copied what was done for the previous parameters. What do you suggest?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 5 years ago

obviously this is something to be done by a human

On 21 Sep 2018, at 11:51, Riccardo Schirone notifications@github.com wrote:

@ret2libc commented on this pull request.

In doc/indent-example.c https://github.com/radare/radare2/pull/11261#discussion_r219443806:

+static const char *help_msg_aa[] = {

  • "Usage:", "aa[0*?]", " # see also 'af' and 'afna'",
  • "aa", " ", "alias for 'af@@ sym.;af@entry0;afva'", //;.afna @@ fcn.'",
  • "aa", "", "analyze all flags starting with sym. (af @@ sym.)",
  • NULL};
  • +static int cmpaddr(const void _a, const void _b) {

  • const RAnalFunction a = _a, b = _b;
  • return a->addr - b->addr; +}
  • +int main(int argc, char **argv) {

  • r_anal_esil_set_pc (core->anal->esil, fcn ? fcn->addr : core->offset);
  • switch (*input) {
  • case 'a': // "afta"
  • { Can it extract functions from switch-case blocks?

Anyway, I don't think we do want such a behaviour. That's something that should be reviewed manually by us.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/pull/11261#discussion_r219443806, or mute the thread https://github.com/notifications/unsubscribe-auth/AA3-lkdDLmFdOQBndI1WgIyIRgS4o9Vjks5udLawgaJpZM4WM0iH.

codecov-io commented 5 years ago

Codecov Report

Merging #11261 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #11261   +/-   ##
=======================================
  Coverage   37.47%   37.47%           
=======================================
  Files         899      899           
  Lines      284166   284166           
=======================================
  Hits       106492   106492           
  Misses     177674   177674

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d04e888...0efaa0e. Read the comment docs.

ret2libc commented 5 years ago

@radare ping I think this is now ready to be (at least) tested more.

ret2libc commented 5 years ago

I think I've done everything requested.