icedland / iced

Blazing fast and correct x86/x64 disassembler, assembler, decoder, encoder for Rust, .NET, Java, Python, Lua
MIT License
2.96k stars 232 forks source link

X86_16 Switch Jump Decoding #59

Closed enusbaum closed 4 years ago

enusbaum commented 4 years ago

I have code where the int16 offset for the jump target is stored within the same code segment.

000020E9h:0002.1AE9h 2EFFA73223      jmp word [cs:bx+0x2332]

but when I feed the code segment in for disassembly, it decodes this dataset as if it were opcodes:

00002932h:0002.2332h 3C1C            cmp al, 0x1c  
00002934h:0002.2334h 45              inc bp
00002935h:0002.2335h 1DB91C          sbb ax, 0x1cb9
00002938h:0002.2338h CC              int3
00002939h:0002.2339h 1C45            sbb al, 0x45
0000293Bh:0002.233Bh 1D451D          sbb ax, 0x1d45
....
00002993h:0002.2393h 22C8            and cl, al
00002995h:0002.2395h 0200            add al, [bx+si]
00002997h:0002.2397h 005657          add [bp+0x57], dl
0000299Ah:0002.239Ah 1E              push ds

The issue I'm running into is that there's a routine entry point actually at the address 0002:2394 that isn't being decoded properly because of this switch jump.

You can see it there, the ENTER 0x2, 0 being

C8020000

Thoughts on how I can work around this? I could probably just manually write a routine that if Iced doesn't have a decoded instruction at that specific IP16 to grab the instruction before and after that address and try and manually reconstruct the missing instructions.

This is a bit of an edge case and weirdness with the compiler. I get the same issue on entry point in SEG1 of an NE DLL

0xd4d commented 4 years ago

Iced assumes you only feed it code, but the code at 2332h looks like a word table, so it's data.

Adding read-only data inside the code segment is pretty common. The only way to not disassemble it as code, but as data, is to figure out what is code and what isn't. A recursive traversal algorithm is often used by higher level disassemblers such as IDA.

Some links:

Every Instruction has a FlowControl property you can use to figure out if it's a branch instruction etc. You can then mark known branch targets as code and continue disassembling there later. Whatever is left is data or code that you didn't detect.

enusbaum commented 4 years ago

Thanks -- I'm going to close this as it's just the nature of the beast if I feed all the code into the disassembler as a single string.

Also, thanks for the feedback! I went ahead and just wrote a routine that manually does a .Create() on the byte at the given location if an instruction wasn't decoded successfully at that offset. Works well enough, might not be 100% but suits my needs.

Cheers!