magic-lang / rock

ooc compiler written in ooc
http://ooc-lang.org/
MIT License
14 stars 4 forks source link

Obfuscator problem #73

Closed thomasfanell closed 7 years ago

thomasfanell commented 8 years ago

Introduction

Relevant code can be found here

I am working on a built-in obfuscator and I'm having some issues with overridden functions in certain scenarios. The obfuscator is launched right after phase 2 (resolving and classification of modules) and it searches the syntax tree for targets by employing a visitor. When a match is found, it stores away the node for later processing (when the search is exhausted).

In the obfuscation phase, function declarations are replaced by their obfuscated version by means of owner removeFunction(originalFunctionDecl) and owner addFunction(obfuscatedFunctionDecl), where obfuscatedFunctionDecl := originalFunctionDecl clone(newName)

The target collector will of course check for function calls that reference obfuscation targets and mark them to be updated code . I currently do this by updating its ref and name properties: code

functionCall setRef(obfuscatedFunctionDecl)
functionCall setName(obfuscatedFunctionDecl name)

Finally, at end of the obfuscation phase, I launch a new resolving process: Tinkerer new(buildParams) process(allModules). code

Example code

The obfuscator is set to change ABSTRACT_FUNCTION to OBFUSCATED_FUNCTION

INHERITENCE_BASE: abstract class {
    ABSTRACT_FUNCTION: abstract func -> String
}

INHERITENCE_CHILD1: class extends INHERITENCE_BASE {
    init: func
    ABSTRACT_FUNCTION: override func -> String {
        "CHILD1"
    }
}

INHERITENCE_CHILD2: class extends INHERITENCE_BASE {
    backend := INHERITENCE_CHILD1 new()
    init: func
    ABSTRACT_FUNCTION: override func -> String {
        //
        // This is the problem
        //
        this backend ABSTRACT_FUNCTION()
    }
}

INHERITENCE_CHILD3: class extends INHERITENCE_BASE {
    backend := INHERITENCE_CHILD1 new()
    init: func
    ABSTRACT_FUNCTION: override func -> String {
        "CHILD3"
    }
}

raise(INHERITENCE_CHILD1 new() ABSTRACT_FUNCTION() != "CHILD1", "INHERITENCE_CHILD1.ABSTRACT_FUNCTION1")
raise(INHERITENCE_CHILD2 new() ABSTRACT_FUNCTION() != "CHILD1", "INHERITENCE_CHILD2.ABSTRACT_FUNCTION2")
raise(INHERITENCE_CHILD3 new() ABSTRACT_FUNCTION() != "CHILD3", "INHERITENCE_CHILD3.ABSTRACT_FUNCTION3")

This particular scenario gives the error:

error No such function ABSTRACT_FUNCTION() for `INHERITENCE_CHILD2`
        this ABSTRACT_FUNCTION()

Pre-obfuscation print-out

Note the addresses of each ABSTRACT_FUNCTION implementation.

ClassDecl {
    ClassDecl INHERITENCE_CHILD1Class
    name: INHERITENCE_CHILD1Class
        function count: 5
            --snip---
            0x29afdd0 --- INHERITENCE_CHILD1 ABSTRACT_FUNCTION -> String
}
ClassDecl {
    ClassDecl INHERITENCE_CHILD2Class
    name: INHERITENCE_CHILD2Class
        function count: 5
            --snip---
            0x2844ee0 --- INHERITENCE_CHILD2 ABSTRACT_FUNCTION -> String
}
FunctionDecl {
    Address: 0x29afdd0
    INHERITENCE_CHILD1 ABSTRACT_FUNCTION -> String
    owner: ClassDecl INHERITENCE_CHILD1
}
FunctionDecl {
    Address: 0x2844ee0
    INHERITENCE_CHILD2 ABSTRACT_FUNCTION -> String
    owner: ClassDecl INHERITENCE_CHILD2
}
FunctionCall {
    this backend ABSTRACT_FUNCTION()
    resolved: yes
    ref score: 513
    expr: this backend
    ref: 0x29afdd0 --- INHERITENCE_CHILD1 ABSTRACT_FUNCTION -> String
}
FunctionCall {
    raise(__OP_NE_String_String__Bool(INHERITENCE_CHILD2 new() ABSTRACT_FUNCTION(), "CHILD1"), "INHERITENCE_CHILD2.ABSTRACT_FUNCTION2", null)
    resolved: yes
    ref score: 2304
    expr: NULL
    ref: 0x2758550 --- raise~assert(condition: Bool, message: String, origin: Class = null)
}
FunctionCall {
    __OP_NE_String_String__Bool(INHERITENCE_CHILD2 new() ABSTRACT_FUNCTION(), "CHILD1")
    resolved: yes
    ref score: 1
    expr: NULL
    ref: 0x2990660 --- __OP_NE_String_String__Bool(left: String, right: String) -> Bool
}
FunctionCall {
    INHERITENCE_CHILD2 new() ABSTRACT_FUNCTION()
    resolved: yes
    ref score: 513
    expr: INHERITENCE_CHILD2 new()
    ref: 0x2844ee0 --- INHERITENCE_CHILD2 ABSTRACT_FUNCTION -> String
}

Each function call clearly hold the correct reference to the corresponding function declaration.

Post-obfuscation print-out

ClassDecl {
    ClassDecl INHERITENCE_CHILD1Class
    name: INHERITENCE_CHILD1Class
        function count: 5
            --snip---
            0x3b1ebb0 --- INHERITENCE_CHILD1 OBFUSCATED_FUNCTION -> String
}
ClassDecl {
    Address: 0x2ca0240
    ClassDecl INHERITENCE_CHILD2Class
    name: INHERITENCE_CHILD2Class
        function count: 5
            --snip---
            0x3b1eaa0 --- INHERITENCE_CHILD2 OBFUSCATED_FUNCTION -> String
}
FunctionDecl {
    Address: 0x3b1ebb0
    INHERITENCE_CHILD1 OBFUSCATED_FUNCTION -> String
    owner: ClassDecl INHERITENCE_CHILD1
}
FunctionDecl {
    Address: 0x3b1eaa0
    INHERITENCE_CHILD2 OBFUSCATED_FUNCTION -> String
    owner: ClassDecl INHERITENCE_CHILD2
}
FunctionCall {
    this backend ABSTRACT_FUNCTION()
    resolved: no
    ref score: -2147483648
    expr: this backend
    ref: NULL
}
FunctionCall {
    raise(__OP_NE_String_String__Bool(INHERITENCE_CHILD2 new() OBFUSCATED_FUNCTION(), "CHILD1"), "INHERITENCE_CHILD2.ABSTRACT_FUNCTION2", null)
    resolved: yes
    ref score: 2304
    expr: NULL
    ref: 0x2758550 --- raise~assert(condition: Bool, message: String, origin: Class = null)
}
FunctionCall {
    __OP_NE_String_String__Bool(INHERITENCE_CHILD2 new() OBFUSCATED_FUNCTION(), "CHILD1")
    resolved: yes
    ref score: 1
    expr: NULL
    ref: 0x2990660 --- __OP_NE_String_String__Bool(left: String, right: String) -> Bool
}
FunctionCall {
    INHERITENCE_CHILD2 new() OBFUSCATED_FUNCTION()
    resolved: yes
    ref score: 1
    expr: INHERITENCE_CHILD2 new()
    ref: 0x3b1eaa0 --- INHERITENCE_CHILD2 OBFUSCATED_FUNCTION -> String
}

Here, the function call this backend ABSTRACT_FUNCTION() is somehow losing its reference (it should point to INHERITENCE_CHILD1 OBFUSCATED_FUNCTION), and the resolver is not able to find it. However, in the last function call, the reference to OBFUSCATED_FUNCTION is correctly maintained.

I have verified that the reference is valid just before updating the function call reference:

Call: INHERITENCE_CHILD1 new() ABSTRACT_FUNCTION() --- Ref: INHERITENCE_CHILD1 OBFUSCATED_FUNCTION -> String
Call: this backend ABSTRACT_FUNCTION() --- Ref: INHERITENCE_CHILD1 OBFUSCATED_FUNCTION -> String
Call: INHERITENCE_CHILD2 new() ABSTRACT_FUNCTION() --- Ref: INHERITENCE_CHILD2 OBFUSCATED_FUNCTION -> String
Call: INHERITENCE_CHILD3 new() ABSTRACT_FUNCTION() --- Ref: INHERITENCE_CHILD3 OBFUSCATED_FUNCTION -> String

I have tried resetting various score and state values in the relevant nodes, but this does not seem to help. Is there something else I need to do to correctly re-link function calls to the obfuscated references? Maybe I am going about this the wrong way?

I am hoping that someone that have more intimate knowledge of the AST and resolving process will throw me a bone. @shamanas @vendethiel @zhaihj perhaps?

vendethiel commented 8 years ago

I think you're vastly overestimating what I know about the internals :-). Because I know absolutely nothing about them. I just read the PRs I see go through...

If I can throw a very wild guess, however – it might have something to do with vtables? Is the call at some point replaced with a proxy that'll dispatch on the correct method based on the class currently in use, and that could be the reason.

That's a very wild guess, because I very literally don't have a single clue as to how OOC works internally.

thomasfanell commented 8 years ago

@vendethiel Well, when I compare addresses with and without the obfuscator, the address of the call changes when using the obfuscator, so you may have pointed me in the right direction here. Thanks, i'll investigate further. :+1:

Although I can't for the life of me see where this is happening... :confused:

alexnask commented 8 years ago

@thomasfanell
Huh, interesting problem. Is there any chance some nodes are never visited, so the call is never replaced? That's my guess. I'm taking a quick look at your implementation, I will probably get back to you tomorrow morning.

EDIT: Awesome work on the AstPrinter by the way.

alexnask commented 8 years ago

This is weird since the AstPrinter finds the function just fine while the TargetCollector appears to be skipping it.

thomasfanell commented 8 years ago

@shamanas

First off, thank you! :+1:

The TargetCollector actually does capture it, as I can print it out within the updateReferences method in Obfuscator, and the address is not changed here.

FunctionCall {
    Address: 0x3016e10
    this backend ABSTRACT_FUNCTION()
    resolved: yes
    ref score: 513
    expr: this backend
    ref: 0x2abfaa0 --- INHERITENCE_CHILD1 ABSTRACT_FUNCTION -> String
}

from Obfuscator updateReferences:

Call: 0x22a8480 --- INHERITENCE_CHILD1 new() ABSTRACT_FUNCTION()
Call: 0x22a8120 --- INHERITENCE_CHILD2 new() ABSTRACT_FUNCTION()
Call: 0x3016e10 --- this backend ABSTRACT_FUNCTION()
Call: 0x3016c60 --- INHERITENCE_CHILD3 new() ABSTRACT_FUNCTION()

This is what boggles me...

It looks like you have figured out how to use the obfuscator (sophisticated as it is), but just in case you're just reading off the issue here and want to try it yourself:

obfuscate.map

INHERITENCE_BASE.ABSTRACT_FUNCTION:OBFUSCATED_FUNCTION

rock -x -r --printAst --obfuscate=obfuscate.map FILE.ooc

alexnask commented 8 years ago

@thomasfanell

I just looked at the code for a bit, missed that part :) That makes things even stranger...

I will be actually running/debugging it in a couple of hours.

thomasfanell commented 8 years ago

Yeah it's been ripped up and rebuilt quite a bit these last few days, so it's quite messy... sorry about that. Thank you for helping!

alexnask commented 8 years ago

@thomasfanell

No I found the code to be really readable, I figured out what it did in about 2 mins :P Believe me, there is much worse code in rock and I'm probably responsible for most of it.

Anyways, I will get back to you later when I have access to my dev machine.

EDIT: Looking into it now

alexnask commented 8 years ago

Just a little style note You can do:

match (targetNode sourceNode) {
    case functionDecl: FunctionDecl =>
        newFunctionDecl := obfuscatedData as FunctionDecl
        // ...
}

// Instead of 
match (targetNode sourceNode class) {
    case FunctionDecl =>
        functionDecl := targetNode sourceNode as FunctionDecl
        newFunctionDecl := obfuscatedData as FunctionDecl
        // ...
}
horasal commented 8 years ago

Generally speaking, losing reference is usually caused by forgetting a clone during resolving. I will take a look into it when I get time tomorrow.

thomasfanell commented 7 years ago

@shamanas @zhaihj I've resumed work on this, and today I found what I think is the culprit, namely FunctionDecl clone ~withName(String). The problem seems to be when cloning the various properties of the FunctionDecl (see link).

Is it really necessary to clone these, or should it be safe to simply copy the references over to the cloned FunctionDecl, that is, to omit the clone() call?

If I omit these calls, the test project compiles and runs fine, but I'm of course worried that this might cause unexpected and serious issues down the line.

Example, this "solves" the problem.

body list each(|e|
    copy body add(e /*clone()*/)
)
alexnask commented 7 years ago

@thomasfanell

I believe clone() is meant to be a deep copy and should clone child nodes, such as the body in this case. From the code you've shown, it's not really FunctionDecl clone that casues the bug but rather some element in the functiondecl body scope.

I'm sorry I don't have much time to look into those issues at the time. I would say go ahead and merge that change if you really need the obfuscator up and running. I will come back to this when I have some free time (this weekend, beginning of October if I can't figure it out in a couple of days).

thomasfanell commented 7 years ago

@shamanas

As it turns out, omitting the clone calls only partially solved the problem, which of course means it didn't really solve the problem at all. As you say, there is something in the child nodes that is not taken into consideration by the obfuscator.

When (if) you do come back, obfuscator-v4 is the current branch I use for target practicing.

Thank you

thomasfanell commented 7 years ago

Yeah, the child nodes in the cloned function declaration is not updated, so I need to send it through the target collector after cloning it.

thomasfanell commented 7 years ago

From the code you've shown, it's not really FunctionDecl clone that casues the bug but rather some element in the functiondecl body scope.

More specifically: if a FunctionCall lives in the body of a cloned FunctionDecl, its reference to the actual function is not cloned with it (functionCall getRef() is null), and so the call is not considered by the target collector, since it checks for this reference in order to select it for obfuscation or not.

this mock-up fix specifically solves the problem, given the example code above:

obfuscatedFunction getBody() list clear()
for ((index, statement) in functionDecl getBody()) {
    newStatement: Statement
    if (statement instanceOf?(Return)) {
        returnStatement := statement as Return
        if (returnStatement expr instanceOf?(FunctionCall)) {
            functionCall := returnStatement expr as FunctionCall
            newStatement = Return new(FunctionCall new(functionCall expr, obfuscatedFunction getName(), functionCall token), returnStatement token)
        }
    }
    if (newStatement == null) {
        newStatement = statement clone()
    }
    obfuscatedFunction getBody() add(newStatement)
}

I'm closing this, now that I know the specific reason behind it all. Thank you for your help!