pfalcon / re1.5

re1, the unbloated regexp engine by Russ Cox, elaborated to be useful for real-world applications
BSD 3-Clause "New" or "Revised" License
42 stars 4 forks source link

Makefile: Add stuff to ease debugging #15

Open ampli opened 9 years ago

ampli commented 9 years ago

This is my proposal for a new Makefile.

The added ASAN definitions are for finding memory related bugs (e.g. out of bounds, use after free(), memory leaks, etc.) and various undefined behavior. See the commit message for more info on what has been added.

For -fsanitize=undefined, a recent enough gcc is needed (found on all recent Linux distributions).

pfalcon commented 9 years ago

I'm torn on seeing such stuff. OTOH, this provides useful features, and good way to bootstrap learning of all that ASAN stuff. OTOH, it's complete kludge for somebody who don't need it and just needs clean makelfile. I'm certainly would be OK with this being Makefile.asan.

ampli commented 9 years ago

It could be indeed a good idea to provide it as Makefile.asan. I can send another pull request if desired.

However, I cannot imagine why someone who develops a program (and re1.5 is a program for development, not for the purpose of actually use as a command) would like to avoid compiling with ASan/UBan (for development purpose), now that it is commonly available in gcc and clang.

For example, I tried to run the MicroPython tests, after compiling it (in the unix directory) with ASan/UBan, and it indicated many apparent problems:

A problem like ../py/parse.c:169:46: runtime error: left shift of negative value -1 (arg<<1 when arg is negative, which result is undefined by the C standard.) reported on 24 source locations. (You may want to cast the variable to size_t).

A problem like: ../py/objint.c:102:16: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' reported on 4 different locations.

A problem like: ../py/emitbc.c:289:9: runtime error: null pointer passed as argument 1, which is declared to never be null Is reported in 9 locations (on 3 different functions). In these particular cases It usually cannot cause harm but it is still not a good idea to allow that.

There is also one buffer overflow report (gc.c:284), but it may be bogus.

I can open an issue on micropython with the summary of these errors, and the Makefile modifications in order to produce them.

pfalcon commented 9 years ago

However, I cannot imagine why someone who develops a program (and re1.5 is a program for development, not for the purpose of actually use as a command)

Well, any software is first of all intended for usage, not development ;-). And re1.5 is library, so it's "usage" is integration into other software. The latest "fashion" in that direction is not even using lib*, bu dropping code into your codebase (I don't like that, but then for uPy, we have a good case why it's useful). Entailing from that, there should be very clean and concise instructions how to integrate re1.5 into your codebase, and clean, concise Makefile is such instructions in "executable form". Seeing "llvm-symbolizer" (far less "/usr/bin/llvm-symbolizer", like you propose), about which nobody knows, doesn't help the cause.

would like to avoid compiling with ASan/UBan (for development purpose), now that it is commonly available in gcc and clang.

Cool, we now just need to wait some 20 years till asan/uban will be the same this as "ls" or "ld", and such questions won't even make sense. Until then - they're nothing but novelty-du-jour, about which "nobody" knows and which will die (will be killed by their own creators) before they have chance to become popular.

For example, I tried to run the MicroPython tests, after compiling it (in the unix directory) with ASan/UBan, and it indicated many apparent problems:

"Apparent" is a keyword here. If such a tool (and there're lot of them!) finds a problem, please verify whether it's true problem or false positive, and submit patch if the former. Keep tally of issues vs false positives, if issues are frequent, and false positives are rare, maybe it's indeed worth enabling it by default for everyone, instead of having someone run occasionally and do end-to-end analysis of each case.

I can open an issue on micropython with the summary of these errors, and the Makefile modifications in order to produce them.

Any contribution is appreciated, but per above, the most useful approach is that someone does end-to-end work on this stuff. Otherwise, such reports may just stay long in queue unattended.

It could be indeed a good idea to provide it as Makefile.asan. I can send another pull request if desired.

Per above, that would be preferred solution, at least for the time being.

ampli commented 9 years ago

Seeing "llvm-symbolizer" (far less "/usr/bin/llvm-symbolizer", like you propose), about which nobody knows, doesn't help the cause.

Usually it is already set up in the environment of the developer. However, I could not assume that , so in order I added it to the Makefile. It is far from ideal, of course, and it is indeed not appropriate for a main Makefile.

If such a tool (and there're lot of them!)

Just please note that this is not "a tool" among "a lot of them" other tools. It is an official integral feature of the GCC and CLANG compilers for some time now (2+ years). However, UBsan and Lsan are newer in GCC.

When GCC issues a warning on an undefined behavior, unless it is in dead code (a thing that may be rare enough now, I didn't encounter such a thing recently), it is not a good idea to let it remain in the code, because compiler optimizations assume you don't do undefined things, and optimize code according to this assumption (and e.g. remove code you didn't think it would remove).

Interesting reading: What Every C Programmer Should Know About Undefined Behavior GCC Undefined Behavior Sanitizer

ASan (address sanitizer) is a great feature of GCC that usually don't have false positives on -O2 code (on -O3 I encountered false positives). For example, it doesn't let you referring an array out of bounds, using indexing or even pointers. This way you can easily find bugs without tedious debugging sessions.

In the uPy tests, ASan found a repeated out-of-bound access in one particular place (gc). If uPy unwind the stack itself (not using longjmp), then this may be a false positive. Otherwise, most probably it indicates a problem. I'm not sure I will be able to enter deeply into that.

To sum up:

  1. My propose Makefile can be installed as Makefile.asan (by modifying the existing pull request).
  2. I can send another pull request of Makefile.asan.
  3. We can forget it (I have no problem with that...).

In any case, my propose Makefile has a test target that runs run-tests. Maybe it can be a good idea to add it to the project's Makefile (I can send a pull request).

ampli commented 9 years ago

To sum up:

  1. My propose Makefile can be installed as Makefile.asan (by modifying the existing pull request).
  2. I can send another pull request of Makefile.asan.
  3. We can forget it (I have no problem with that...).

In any case, my propose Makefile has a test target that runs run-tests. Maybe it can be a good idea to add it to the project's Makefile (I can send a pull request).

Is there anything to do more regarding that? If not, this can be closed.

pfalcon commented 9 years ago

Can we please choose option 2 above? Thanks.

In any case, my propose Makefile has a test target that runs run-tests. Maybe it can be a good idea to add it to the project's Makefile (I can send a pull request).

Yes, that's good idea, I'm missing that too.

pfalcon commented 9 years ago

ping

ampli commented 9 years ago

Other possible improvements to the Makefile (implemented in my devel branch). I can send a pull request (each in a different commit) if they look fine. In any case, I propose to include all of them in Makefile.asan (to be sent next).

  1. The current Makefile uses TARG=re, which is not in use (besides the just sent patch for "make test"). Proposal: use $(TARG) instead of hard-coding "re".
  2. The current CFLAGS, uses -ggdb. I propose to change it to -ggdb3. This adds the ability to print and show macros:
(gdb) p NON_ANCHORED_PREFIX
$1 = 5
(gdb) info macro NON_ANCHORED_PREFIX
Defined at /usr/local/src/re1.5/makefile1/re1.5.h:135
  included at /usr/local/src/re1.5/makefile1/main.c:5
#define NON_ANCHORED_PREFIX 5
  1. Explicitly specify the debug CFLAGS (commented out): #CFLAGS=-DDEBUG -ggdb3 -Wall -O0 I propose -O0 for debug sessions because -Os enables -O2 which may optimized-out variables and change the order of exaction, a thing that makes debugging using gdb inconvenient.
pfalcon commented 9 years ago

The current Makefile uses TARG=re, which is not in use (besides the just sent patch for "make test"). Proposal: use $(TARG) instead of hard-coding "re".

ok

The current CFLAGS, uses -ggdb. I propose to change it to -ggdb3. This adds the ability to print and show macros:

Fairly speaking, I don't know why it's not just -g, and using that would be my preference (any other option may not work with a particular toolchain).

Explicitly specify the debug CFLAGS (commented out): #CFLAGS=-DDEBUG -ggdb3 -Wall -O0

I.e. you want add line like the above to Makefile? Sounds good.

Thanks!