woodruffw / steg86

Hiding messages in x86 programs using semantic duals
https://crates.io/crates/steg86
Other
289 stars 11 forks source link

"encountered an invalid instruction" when operating on PE32/PE32+ #9

Open autumnontape opened 4 years ago

autumnontape commented 4 years ago

I've tried running steg86 profile against several EXEs and DLLs, both PE32 and PE32+, and every time, it has produced an error like this:

Fatal: encountered an invalid instruction at text offset 3678 (file offset 4702)

It seems like this should be easy to reproduce, but I can upload an example file if not. I've had no such problems with ELF files.

woodruffw commented 4 years ago

Interesting. I only tested against some small PEs, but I haven't seen that. Would you mind uploading an example?

PEs in the wild are interesting things, so it wouldn't surprise me if many include data in their text sections; that would trip steg86 up. The general fix here is an unsolved one (CFG recovery/code-data disambiguation for arbitrary binaries), but steg86 could do a few things to make the happy path simpler:

autumnontape commented 4 years ago

I can't really upload files right now but will later -- in the meantime, if you have any Unity-based games for Windows downloaded, the game executables should trigger this error.

Another possible option would be to act as if the text section ended at the first illegal instruction, which would still be dangerous because data may accidentally look like valid instructions, but less so than powering through.

I think this program is cute and might use it for some little easter eggs in the future, but not for any binaries that I'm not compiling and linking myself, so if the problem is data in the text section, it shouldn't affect me! But I tried like five files, and they all had this problem, so I guess MSVC must like doing this or something.

woodruffw commented 4 years ago

I don't have any Unity games, but I do have a Windows VM -- I'll see if I can find a testcase 🙂

Another possible option would be to act as if the text section ended at the first illegal instruction, which would still be dangerous because data may accidentally look like valid instructions, but less so than powering through.

Yeah, this would be a good third option to have!

I guess MSVC must like doing this or something.

Yeah, quite possibly. I would have expected it to be a little more discerning since mixing code and data makes the CPU's L1I/L1D and ITLB/DTLB work harder, but it's always a mystery with MSVC.

autumnontape commented 4 years ago

putty.zip

For an example of an erroring input, there's putty.exe, which is under the MIT license, and which triggers this error message when I run steg86 profile on it:

Fatal: encountered an invalid instruction at text offset 460395 (file offset 461419)
autumnontape commented 4 years ago

psftp.zip

This is the PuTTY SFTP client, which also triggers the error and is smaller:

Fatal: encountered an invalid instruction at text offset 319873 (file offset 320897)
autumnontape commented 4 years ago

I'm not great with reverse engineering tools, but I opened up psftp.exe in Cutter, and the address reported in the error is near the start of a jump table (at 0x0044f181):

![disassembly from psftp.exe; there's a jump table in the text section at address 0x0044f17e](https://user-images.githubusercontent.com/40726037/90579391-b78dba80-e17a-11ea-80fd-45d979010c16.png)
woodruffw commented 4 years ago

Yep, that looks right to me. Most compilers that I'm aware of would use a pseudo-instruction to place the jump table in .data or .rodata (or whatever), but maybe MSVC isn't bright enough or something else interfered.

woodruffw commented 4 years ago

In the mean time, I'd be happy to accept a PR that adds support for punching "holes" in the Text structure. It's something that I can do on my own, but if you'd like to get a head start on it, feel free :slightly_smiling_face:

autumnontape commented 4 years ago

Sure, it seems interesting to work on. Here are my thoughts on how to implement it, let me know what you think.

At least to begin with, the input format can be plain CSV, which won't require any dependencies. The two input columns are an offset into the text section and a length, both in bytes, and each row describes a span of instructions that may be used for steganography.

A map of this same information can then be optionally embedded inband with the message to make it possible to extract the message without having to pass the CSV file around like a decoder ring. There could be a dedicated bit to distinguish between mapped and mapless modes, or they could be distinguished by different magic numbers. The inband map uses varints and counts the lengths of usable spans in terms of semantic pairs and unusable spans in terms of bytes.

woodruffw commented 4 years ago

At least to begin with, the input format can be plain CSV, which won't require any dependencies. The two input columns are an offset into the text section and a length, both in bytes, and each row describes a span of instructions that may be used for steganography.

:+1:, that sounds very reasonable to me. Having it be an explicit allowlist rather than "holes" also makes more sense, now that I think about it.

There could be a dedicated bit to distinguish between mapped and mapless modes, or they could be distinguished by different magic numbers. The inband map uses varints and counts the lengths of usable spans in terms of semantic pairs and unusable spans in terms of bytes.

Different magic numbers sounds good to me: we could do the current magic incremented by 1 (b'x') to indicate special treatment. The encoding you propose also makes sense to me.