uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.17k stars 253 forks source link

Address.cs : Add(offset) to segmented addresses. Question. #498

Closed chostelet closed 6 years ago

chostelet commented 6 years ago

Bonjour John,

Looking at 'Add(offset)' in 'Address.cs' for 'ProtectedSegmentedAddress' I was wondering what would be the result if the added offset value would be largely greater than a segment size (64k, 0x10000)... For example, if the added offset value is, let say, 0x54321 (>>64k) which is a valid 'long', the result would be wrong. Do you agree ? Looks like 'AddressTests.cs' does not cover this case.

Christian PS: Do you have any plan to migrate to C# 7.0 (multiple syntax simplifications), .NET 4.7 and NUnit 3 ? .NET 4.7 should be straigthforward.

uxmal commented 6 years ago

Salut Christian,

The Intel segmented addresses have always been tricky. Strictly speaking, the offset of such an address is valid even if it exceeds 0xFFFF -- on the 80386 and later CPUs. In fact, operating systems like Windows 95 routinely use segment:offset combinations where the offset is larger than 16 bits. It seems that ProtectedSegmentAddresses should appear in two flavors, one 32-bit version for the 8086 - 80286 machines and a separate 48-bit version for 80306 and later.

Leaving aside the question of whether segmented addresses should be 16:16 or 16:32, another issue is evident in the implementation of ProtectedSegmentedAddress. There, when the address overflows 0xFFFF, it increments the selector by 0x1000 and masks off the bottom bits of the offset. This is a simple copy/paste bug from the RealSegmentedAddress and is actually incorrect. You're not guaranteed that protected mode segmented addresses are laid out in such a way that adding a value to a selector gets you to a well-defined linear address.

Instead, the addition should just silently overflow, resulting in a possibly invalid pointer. If that segmented address is ever followed during decompilation, the resulting linear address is likely to result in an unmapped location (i.e. nonexistent in the Program.SegmentMap, and will throw an exception. Naturally, you could get "lucky" and end up in a valid segment by sheer coincidence, just like on a real processor. The results of such "lucky" additions is undefined behavior, of course.

Fixing ProtectedSegmentedAddress to silently overflow is easy enough. I can get around it sometime later today.

In order to address whether segemented addresses should be 16:16 or 16:32, it's possible to change the X86ProcessorArchitecture.MakeSegmentedAddress to reflect the distinction, but the ramifications are far-reaching.

First, we need to expose a mechanism for selecting, for a particular architecture, a specific model (8086, 80286 etc.) Selecting an architecture would affect the disassembler, as obviously the original 8086 has none of the many ISA extensions that Intel and AMD have piled on the architecture. Naturally, other architectures (like m68k) could benefit from this as well.

This cascades into the user interface. The command line driver needs a mechanism for specifying model as well as CPU architecture. In the GUI driver, we need a way to allow a user, after opening a binary, to change the architecture model. What happens then? Is it only permissible to change architecture / model before the scanning phase? Or if it is permissible, what should happen when you switch architecture after you're almost done with the automatic decompilation of the program? Should the user be told : "you have to restart the decompilation as you've changed a fundamental assumption about the processor/ISA" or should Reko just allow it, with inconsistent results as a consequence?

This area is nontrivial, alas, and I'm in favor of leaving it as is for now. I tend to not implement features until a user arrives with an actual sample binary that exhibits a problem.

I've addressed your question about C# 7.0 in #499.