radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.67k stars 3k forks source link

ROP regex is not as expected for excluding characters #20923

Open andrewzigerelli opened 2 years ago

andrewzigerelli commented 2 years ago

Environment

Thu 20 Oct 2022 10:27:04 AM EDT
radare2 5.7.9 29128 @ linux-x86-64 git.5.7.8-260-gccbd36e88
commit: ccbd36e8853afdee48c7d6d37a66c17b85c97698 build: 2022-10-07__01:04:35
Linux x86_64

Description

Rop is not expected, as least for excluding characters.

Test

Using https://github.com/radareorg/radare2-testbins/blob/master/elf/analysis/ls-linux64

r2 ./ls-linux64
e rop.len = 2
"/R/q [^p]"
...
0x0001a219: add al, byte [rsi + 0xa]; ret;
0x0001a229: push rdi; ret;
0x0001a26a: add byte [rax], al; int3;
...

Same results for: "/R/q [^p]+" or "/R/q [^p]+;"

Further, another possibly more interesting case, using same rop.len and same binary:

"/R/q r[^d]i"
...
0x0001358b: mov rdi, rsi; jmp rax;
0x0001358c: mov rdi, rsi; jmp rax;
0x0001358d: mov rdi, rsi; jmp rax;
...

Basically same results for "/R/q r[^d]i;"

However, the semicolon does make a difference:

"/R/q r[^d]i"
...
0x0001011b: mov rsi, qword [rsp + 0x38]; call 0xf190;
0x000102d3: xor rdi, r15; jmp qword [rsi + 0xf];
0x000102d4: xor edi, edi; jmp qword [rsi + 0xf];
...
"/R/q r[^d]i;"
0x0001011b: mov rsi, qword [rsp + 0x38]; call 0xf190;
0x00010424: mov rsi, 0xffffffffffffffff; jmp 0x10360;
...

Is there a specification on how the rop regex is supposed to work? I don't understand the logic in construct_rop_gadget(...) in libr/core/cmd_search.c. but there seems to be some processing to split tokens at the semicolon before and after calls to r_regex_match. Further, I'm not sure what should happen if the "regex tokens" separated by semicolons is < rop.len.

Even if I make sure num(regex_tokens) == rop.len, I don't get expected results. This test uses rop.len =2, with two regex expressions separated by token

...
 "/R/q [^p]+;[^p]+"
0x0001baf1: jp 0x1baf2; call qword [rip]
...

Normally I would expected [^p]+ to match the semicolon, but because of the preprocessing, it shouldn't be passed to regcomp. I have also experimented with putting \3b inside the excluded character set and sometimes I get better results (not correct), but this seems likely to break things if the intent is to scan for the semicolon before any regex match is attempted.

Lazula commented 2 years ago

Seems to be caused by quiet mode.

$ r2 -
[0x00000000]> e rop.len
5
[0x00000000]> "wa mov rdi, rsi; jmp rax;"
INFO: Written 5 byte(s) (mov rdi, rsi; jmp rax;) = wx 4889f7ffe0 @ 0x00000000
[0x00000000]> "/R/q r[^d]i"
0x00000000: mov rdi, rsi; jmp rax;
[0x00000000]> "/R r[^d]i"
[0x00000000]> e rop.len=2
[0x00000000]> "/R/q [^p]+;[^p]+"
0x00000000: mov rdi, rsi; jmp rax;
0x00000001: mov edi, esi; jmp rax;
[0x00000000]> "/R [^p]+;[^p]+"