nemerle / dcc

This is a heavily updated version of the old DOS executable decompiler DCC
GNU General Public License v2.0
143 stars 27 forks source link

Summary of your changes? #2

Open pfalcon opened 9 years ago

pfalcon commented 9 years ago

Can you please provide summary of changes you've done to dcc? In an accessible place, like README or other doc. I'd use commit log, but it's quality leaves much to be desired, e.g. LLVM headers were started to be included in 900438c4, and the only commit message says is "from work".

Or perhaps the matter can be approached form another side - if I want to try decompilation of new code, how to produce binary file for input? I figure dcc doesn't suppport DOS .com files, which are easily to produce. I ended up installing openwatcom for .exe support, but dcc complains:

dcc: Don't understand 80386 instruction 63 at location 00014C

Even though openwatcom was told to produce 8086 code. Then I thought that maybe LLVM dependency allows to open ELF files, but that doesn't seem to be the case either...

pfalcon commented 9 years ago

I figure dcc doesn't suppport DOS .com files

Ok, I apparently got confused by #1.

nemerle commented 9 years ago

DCC is a decompiler for DOS 16bit executables ( both exe and com )
As for summary of changes, it would be pretty hard to do, most of those were related to code architecture, and getting rid of original's problems with pre-allocated memory.

Also thank you for noticing the cfg problem in https://github.com/decomp/decompilation/issues/172 , I'll try to dust off my old turbo-c installation and see if I can duplicate it.

nemerle commented 9 years ago

using the following code

int a,b,c;
int main() {
    while(c) {
      if(a) {
     b = 3;
       }
    }
    return b;
}

compiling it to an exe [ turbo c 2 ( small memory model) ] and using the dcc/qt branch gives:

/*
 * Input file   : E.EXE
 * File type    : EXE
 */

#include "dcc.h"

void main ()
/* Takes no parameters.
 * Unknown calling convention.
 */
{

    while ((var0082A != 0)) {
l1: 
        if (var00826 != 0) {
            var00828 = 3;
        }
        else {
            goto L1;
        }
    }   /* end of while */
}

While the code is not as simple as it should be, the cfg seems better. So it seems that the issue depends on the compiler. What compiler/version/commandline have You used ?

pfalcon commented 9 years ago

Thanks for the response. To have some background https://github.com/decomp/decompilation/issues/172 , there we discuss limits of simple (to understand) CFG structuring algorithm based on graph reductions via context-free rewrite rules. Simple algorithms (like implemented by @mewmew in https://github.com/decomp/restructure) can't structure graphs on which simple "jump threading" optimization was performed. In my algorithm, I perform graph normalization, and can handle at least simple jump-threaded cases, but cases of more aggressive compiler optimization are still perplexing (I don't have idea how to approach them so far).

To make sure that we with @mewmew don't just waste our time, I wanted to see how other decompilers handle such cases, starting with 20-year classics of dcc.

I used "bcc" compiler from the corresponding Ubuntu package. When I use "while (1)" in source (i.e. infinite loop), I get decompiled output as in https://github.com/decomp/decompilation/issues/172#issuecomment-106106929 . When I use "while (c)", I get similar output to your comment above (note that "else" close is superfluous and can be removed together with the label, that goto is equivalent of "continue", which is superfluous at the end of while body).

With all the above, I should keep in mind possibility that decompilation results like above are due to your (arguably, more or less substantial) changes. Note that I don't imply or even suspect that, just keep in mind such possibility, because otherwise claiming that "Christina Cifuentes' CFG structuring algorithm can't deal with simple if-within-while cases" is too bold, unless I used her original code.

So, how to resolve this situation - to maintain a branch with minimal changes to the original dcc code, essentially, only the changes required to build it with a modern compiler like gcc.

nemerle commented 9 years ago

So, how to resolve this situation - to maintain a branch with minimal changes to the original dcc code, essentially, only the changes required to build it with a modern compiler like gcc.

That shouldn't be a problem, AFAIR original source is compileable almost out-of-the-box - I can create a separate branch for it today/tomorrow.

pfalcon commented 9 years ago

That shouldn't be a problem, AFAIR original source is compileable almost out-of-the-box - I can create a separate branch for it today/tomorrow.

If you can, I'd appreciate it. I searched for original code, but first found your project, and given that original apps was for MS-DOS at all, I assumed it would take some effort to forward-port, and as you already maintain/further develop dcc, would be nice to have it all in one place.

nemerle commented 9 years ago

original sources are available at 2a59d07ef2d43b6af4d679f4c549c3173bf1203d

pfalcon commented 9 years ago

Thanks, I confirm that with the original code, the sample CFGs are structured in the same, i.e. sub-ideal, way. Boomerang is next to try ;-).

pfalcon commented 9 years ago

So, while the issue above was cleared, I still think there can be done something regarding the title of this bug. E.g.:

As for summary of changes, it would be pretty hard to do, most of those were related to code architecture, and getting rid of original's problems with pre-allocated memory.

Yes, sounds good, and this is mentioned in README. But what about LLVM dependency for example - from a quick look, I figured that you inherit some classes in dcc from LLVM classes, what's the plan here and are there any features enabled by LLVM dependency? Perhaps you could add a file to a repo, like CHANGES.md or TODO.md where you describe changes even if they're work in progress and unfinished.

Other part that getting instructions to build (with cmake) would help.