pfalcon / re1.5

re1, the unbloated regexp engine by Russ Cox, elaborated to be useful for real-world applications
BSD 3-Clause "New" or "Revised" License
41 stars 4 forks source link

Bug in cleanmarks() #16

Closed ampli closed 9 years ago

ampli commented 9 years ago
$ ./re '[a]*' a
#1 a
recursive match (0,1)
recursiveloop match (0,1)
backtrack match (0,1)
thompson match (0,0)
pike match (0,0)
$

Please note the last two lines: they indicate a bug. Actually, not only the result of the thompson/pike functions is incorrect here, but there is a silent out-of-bound access, as can be seen when compiled with ASan:

./re '[a]*' a
#1 a
recursive match (0,1)
recursiveloop match (0,1)
backtrack match (0,1)
=================================================================
==7579==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60300000f073 at pc 0x0000004089d2 bp 0x7ffe5b055a50 sp 0x7ffe5b055a40
READ of size 1 at 0x60300000f073 thread T0
    #0 0x4089d1 in addthread /usr/local/src/re1.5/master/thompson.c:37
    #1 0x408cc7 in addthread /usr/local/src/re1.5/master/thompson.c:48
    #2 0x409684 in re1_5_thompsonvm /usr/local/src/re1.5/master/thompson.c:118
    #3 0x40529d in main /usr/local/src/re1.5/master/main.c:121
    #4 0x306922078f in __libc_start_main (/lib64/libc.so.6+0x306922078f)
    #5 0x4013a8 in _start (/usr/local/src/re1.5/master/re+0x4013a8)
AddressSanitizer can not describe address in more detail (wild memory access suspected).
SUMMARY: AddressSanitizer: heap-buffer-overflow /usr/local/src/re1.5/master/thompson.c:37 addthread

The problem is in cleanmarks(): case's for classes (Class, ClassNot, NamedClass) are missing. As a result, the mark bit is cleared from command offsets too! This clobbers negative offsets of Jmp commands, and converts them to a relatively large positive number, that later causes an out-of-bound memory access.

I would like to send a pull request that fixes that, but I have 2 questions (your answers can guide me in further fixes that I would like to send):

  1. I would like to fix the indentation problem in this function (instead of 4 spaces, the first indentation is 7 spaces and the second one is 8 spaces), as the first commit.
  2. If end is omitted, there is less one variable on the stack, and also the code becomes smaller on both SPARC (my test case for RISC) and x86_64. So should it be eliminated? (In a separate commit of course).
pfalcon commented 9 years ago
  1. I would like to fix the indentation problem in this function (instead of 4 spaces, the first indentation is 7 spaces and the second one is 8 spaces), as the first commit.

Yes, please (as a separate commit).

2.

Can't say much right away, will need to look into it ;-).

ampli commented 9 years ago

If end is omitted, there is less one variable on the stack, and also the code becomes smaller on both SPARC (my test case for RISC) and x86_64. So should it be eliminated? (In a separate commit of course).

Can't say much right away, will need to look into it ;-).

Currently we have in cleanmarks():

cleanmarks(ByteProg *prog)
{
       char *pc = prog->insts;
       char *end = pc + prog->bytelen;
       while (pc < end) {

The following causes the code space saving mentioned above:

cleanmarks(ByteProg *prog)
{
       char *pc = prog->insts;
       while (pc < pc + prog->bytelen) {

Of course this particular saving is relatively small. In larger functions such accumulating savings can be relatively big. Hence this is also a general question. (BTW, In any case I don't propose or intend to do optimizations that reduce readability.)

pfalcon commented 9 years ago

Currently we have in cleanmarks():

I see, thanks for elaboration. Are you sure that performing that source change made any difference for gcc with at least -O2 setting? That's trivial case of expression propagation - if end is used once, then code produces with this rather mainline optimization should be equivalent of source-level change you show.

pfalcon commented 9 years ago

Let me know if something is not answered yet which blocks (or backlogs) your work on fix(es) mentioned above, sorry for some delay with processing these.

ampli commented 9 years ago

If you mean set O2 in addition to -Os, then please note that gcc's manual says: -Os enables all -O2 optimizations that do not typically increase code size. I validated that indeed adding -O2 to -Os doesn't change the code size. When using only -O2 (with the yet unfixed cleanmarks() code), on x86_64 the difference is negligible:

2960 bytes before this manual optimization, and 2958 bytes with it. It is indeed not expected that there will be a saving. I didn't look in the assembly code for the reason.

However, when compiling with only -O2 for sparc64, this manual optimization increases the code by 8 bytes...

But before that you compared code using -Os. But if you use -O2 without -Os it looks to me totally other considerations are added.

So the question remained: Should I code for speed or for space (both stack and code), supposing the readability is comparable? And an added question: should I assume also using -O2 alone? (As we saw, it is hard to consider code space if both ways can be equally used.)

pfalcon commented 9 years ago

If you mean set O2 in addition to -Os

Well, I meant "good enough level of optimizations", which is usually -O2, though as you point out, most of those optimizations would be enabled for -Os too.

So the question remained: Should I code for speed or for space (both stack and code), supposing the readability is comparable?

Please write new code, optimizing for space, that would be much appreciated. But otherwise, please follow golden rule of making changes: change as little as possible while working on particular task. Then means that unless the task is "optimize all the code for size", then touching existing code for a minor changes like above isn't much useful, and means side-tracking into deep trenches of analysing stability of such changes' results (across archs, compiler versions and settings), etc.

Btw, if you'd really like to test changes' effect on a RISC arch, please use ARM - that's pretty much a leader of embedded archs, so making changes which affect it negatively would be strange ;-). (Even though ARM itself is not pure RISC arch.)

Hope that clarifies the matter, and sorry for this excursion into embedded code optimization, hope that didn't sidetrack you too much, and found it interesting in some aspects. Thanks!

ampli commented 9 years ago

OK. I am now using an ARM cross-compilation to check the code size (this way I checked the cleanmarks() code).