uxmal / reko

Reko is a binary decompiler.
https://uxmal.github.io/reko
GNU General Public License v2.0
2.17k stars 253 forks source link

Users should be able to add datatypes to reko projects #107

Closed uxmal closed 8 years ago

uxmal commented 8 years ago

Reko supports adding custom names to procedures. It should also allow users to rename data types that Reko discovers, and even introduce new types . The big question is: how does this manifest itself.

One alternative is to just rely on the ANSI C compiler and allow users to add *.h files to the project. Probably the quickest way, rather than having to develop a fullblown custom user interface. Any thoughts?

ptomin commented 8 years ago

Now it's possible to load user-defined type libraries from XML metafiles. The next problem is to resolve user-defined type references in the same way as platform-defined types implemented in #99. I think TypeReferenceResolver class should be introduced in the same way as ImportResolver. I'll try to fix it

uxmal commented 8 years ago

Consider adding the methods you ned to the already existing interface ImportResolver. Perhaps ResolveTypeName, ResolveStructureType, ResolveUnionType methods on the interface?

uxmal commented 8 years ago

I'm breaking this down into smaller items so we can work them off. Feel free to suggest more items.

Advanced:

I've certainly missed some things, but it's a start.

ptomin commented 8 years ago

File > Open metadata... works OK now. So 3-rd item could be marked as done.

ptomin commented 8 years ago

I have not added methods to ImportResolver. I have just moved CreateProcedureSerializer, CreateTypeLibraryDeserializer and CreateSymbolTable methods from Platform to Program so that it was possible to resolve as platform as user-defined types and signatures of external procedures

uxmal commented 8 years ago

OK that makes sense. I assume you're coding up unit tests to cover the different use cases you are implementing?

ptomin commented 8 years ago

I have fixed existing unit tests. Now all of them are passed. Also I'll add new unit tests

ptomin commented 8 years ago

I have added several unit tests for new implemantation. John, please review the changes. If you accept them then they should be merged to master

uxmal commented 8 years ago

Go ahead and make a PR; I will review it there.

ptomin commented 8 years ago

John, I have questions related to above changes.

  1. For what reasons do you add user query of platform when metadata loading (see ProjectLoader::DeterminePlatform)? Could it be removed? After recently changes it becomes useless.
  2. Removing ProjectLoader::DeterminePlatform breaks Prld_LoadMetadata_NoPlatform_ShouldQuery test. Should this test be removed?
uxmal commented 8 years ago

This stems from a time when there was a large change in Reko projects. Previously, they could only support a single Program, and therefore a single Platform. The notion of "current Platform" was restricted to the current Program. Project was just a container for the single Program.

Once the option to add multiple Programs was added, the result of the coupling between Program and Platform is that a Reko project allows each Program to potentially have a separate Platform. That is, you could potentially create a Reko project with one binary that is an x86 ELF binary, and another that is a dump of a 6510-based C64 program.

+-------------------------+
| Project                 |
| +----------------------+|
| | Program1: foo.exe    ||
| | Platform x86 Posix   ||
| | Architecture x86     ||
| +----------------------+|
| | Program2: disk.dmp   ||
| | Platform: c64        ||
| | Architecture 6502/10 ||
| +----------------------+|
+-------------------------+

In retrospect, that generality seems overly, well, general. You're unlikely to need such a setup; typically a user will be working with one and only one Platform in mind:

+-------------------------+
| Project                 |
| +----------------------+|
| | Program1: foo.exe    ||
| | Platform MS-DOS      ||
| | Arch: x86            ||
| +----------------------+|
| | Program2: disk.ovr   || <= a raw overlay file with no relocations
| | Platform: MS-DOS     ||
| | Arch: x86            ||
| +----------------------+|
+-------------------------+

It probably makes more sense for Projects to have a single Platform and Architecture which all of the loaded Programs share:

+-------------------------+
| Project                 |
| Platform MS-DOS   <-----+----.
| Architecture x86  <-----+-.  |
| +----------------------+| |  |
| | Program1: foo.exe    || |  |
| | Platform: reference to -|--+
| | Arch: reference to -----+  |
| +----------------------+| |  |
| | Program2: disk.ovr   || |  |
| | Platform: reference to -|--'
| | Arch: reference to -----'
| +----------------------+|
+-------------------------+

This implies that once the first binary is loaded into a project, it should set a (currently not existing) Platform property on the Project class. The next binary or type library / metadata collection would be forced to conform to that platform. If Reko determines that the second binary is not compatible, it should refuse the binary / metadata, saying something to the effect of "The executable file/metadata file "foo.exe" is based on the "$(platform.Name)" operating environment. However, the project is currently based on the "$(project.Platform.Name)" operating environment. Operating environments can't be mixed.[OK]" If you are loading multiple raw binary chunks (like, say, MS-DOS .COM files or raw disk images usong File > Open as...) then the user should only need to enter the Platform and Architecture for the first binary, and have that choice be pre-filled on subsequent Open as... operations.

DeterminePlatform() would in that scenario be rendered useless and should be removed. However, now the Platform property that is on a Program is going to have to be a copy of a (new) Platform property on Project. This Project needs to be saved as a string-valued property PlatformName in the class Project_v3.

As you can imagine, this is a large change to make to Reko. It would preclude users from mixing different platforms and architectures in the same project, but as I mentioned above I don't think this going to be a common use case.

For the short term, however, I think it's OK to replace the body of DeterminePlatform with a simple

var platformInUse = project.Programs.Select(p => p.Platform).Distinct().Single();
return platformInUse;

and the need for the unit test Prld_LoadMetadata_NoPlatform_ShouldQuery goes away.

uxmal commented 8 years ago

Pavel: I've just checked in (c7afa11f0c511586ec6803598fe3bd0f60c2e9f0) a new version of the Project persistence format, Project_v4 where the names of the architecture and the platform, which are considered to be shared across the whole project, are stored as strings. I've changed DeterminePlatform so that if the "shared" platform is present, Reko will use that when deserializing metadata.

I've left the UI prompting code in the master branch for now, but you can remove it from your dev't branch, along with the unit tests since they should no longer be needed.

ptomin commented 8 years ago

At last I have achieved intermediate goal: creation of python module extension sample (453dcb0)

uxmal commented 8 years ago

Nice! Do you want to add that sample to the reko/subjects tree? If so, could I trouble you to move the files to a directory called ~/subjects/PE-x86/pySample, then send me a PR?

Now that you've reached this place, you can see that Reko needs more work :-) Looking at the generated code, what would you like to see impoved next?

ptomin commented 8 years ago
  1. The next task is to fix exceptions throwing when decompiling it. Now the following exception is thrown at the reconstruct data types stage: "Error when reconstructing types. Comparer threw an exception. Way too deep" The reason is known by #113 issue: recursive structure("struct testStruct{struct testStruct *next}")
uxmal commented 8 years ago

Seems like you should be able to handle that, right Pavel? . Basically, the comparer needs to keep a "visited" set of all complex data structures it has already visited, and if it already has visited a structure, assume equality. Open a separate issue to track it. I believe we can close this issue now?

ptomin commented 8 years ago

Existing user-defined types functionality is enough for me

uxmal commented 8 years ago

Closing this now, then.