vtil-project / VTIL-Core

Virtual-machine Translation Intermediate Language
BSD 3-Clause "New" or "Revised" License
1.35k stars 163 forks source link

Added new "thunk" removal pass (Single jump bblock) #65

Closed z3t4s closed 2 years ago

z3t4s commented 2 years ago

Added a new pass that detects thunk like (I'm open to better naming suggestions) single jump basic_blocks and removes them. It does that by changing the destination of the previous branch and fixing the prev/next links.

I had to make changes to auxiliaries.cpp as the master branch version made optimization way worse in my case. The change in variable.cpp is the same as pull request 62 from wallds

Example: image

wallds commented 2 years ago

Such false conditional branches are common in vmp.

auto block1 = vtil::basic_block::begin((uintptr_t)0x1000);
auto rtn = block1->owner;
{
    // 0x1000: js eflags@11:1 0x2000, 0x3000
    block1->js(vtil::REG_FLAGS.select(1, 11), (uintptr_t)0x2000, (uintptr_t)0x3000);
}
auto block2 = block1->fork((uintptr_t)0x2000);
{
    // 0x2000: jmp 0x4000
    block2->jmp((uintptr_t)0x4000);
    block2->fork((uintptr_t)0x4000);
}
auto block3 = block1->fork((uintptr_t)0x3000);
{
    // 0x3000: jmp 0x4000
    block3->jmp((uintptr_t)0x4000); 
    block3->fork((uintptr_t)0x4000);
}
auto block4 = rtn->get_block((uintptr_t)0x4000);
{
    // 0x4000: vexit 0
    block4->vexit((uintptr_t)0);
}
vtil::logger::log("Before:\n");
vtil::debug::dump(rtn);

vtil::optimizer::bblock_thunk_removal_pass{}(rtn);
vtil::logger::log("After:\n");
vtil::debug::dump(rtn);

Crashed here https://github.com/vtil-project/VTIL-Core/blob/617df086bf9896723b265812de51ac689ef1b8d8/VTIL-Compiler/optimizer/bblock_thunk_removal_pass.cpp#L103

Then I made the following changes

auto rtn = blk->owner;
if (visited.size() == rtn->num_blocks())
{
    for (auto it : obsolete_blocks)
        rtn->delete_block(const_cast<vtil::basic_block*>(it));
}

The crash is solved. But it also needs to be converted to jmp. (dst1==dst2)

 | | Entry point VIP:       0x1000
 | | Stack pointer:         0x0
 | | Already visited?:      N
 | | ------------------------
 | | 0000: [ PSEUDO ]     +0x0     js.q     $flags@11:1  0x4000       0x4000
 | | | | | | Entry point VIP:       0x4000
 | | | | | | Stack pointer:         0x0
 | | | | | | Already visited?:      N
 | | | | | | ------------------------
 | | | | | | 0000: [ PSEUDO ]     +0x0     vexit.q  0x0
 | | | | | | Entry point VIP:       0x4000
 | | | | | | Stack pointer:         0x0
 | | | | | | Already visited?:      Y
 | | | | | | ------------------------
z3t4s commented 2 years ago

Good catch with the invalid ptr.

Maybe something like this does the job?

for (size_t i = 0; i < prev->back().operands.size(); i++)
{
    if (!prev->back().operands[i].is_immediate())
        continue;

    auto& ins = make_mutable(prev->back());                     
    if (ins.operands[i].imm().ival == blk->entry_vip)
    {
        ins.operands[i].imm().ival = next->entry_vip;
    }   
}

//TODO: should we do this with another operands loop? We currently only have "js" that qualifies so a simple if does the job for now
//          
auto branching_instruction = prev->back();
if (branching_instruction.base == &ins::js)
{
    fassert(branching_instruction.operands.size() == 3);

    //This pass should not interfere with blocks that aren't already touched by branch correction / our corrections above
    //
    if (branching_instruction.operands[1].is_immediate() && branching_instruction.operands[2].is_immediate())
    {
        if (branching_instruction.operands[1].imm().ival == branching_instruction.operands[2].imm().ival)
        {
            auto new_vip = branching_instruction.operands[1].imm();

            auto ins = std::prev(prev->end());                      

            (+ins)->base = &ins::jmp;
            (+ins)->operands.resize(1);
            (+ins)->operands[0] = { new_vip.ival, arch::bit_count };

        }
    }
}
Before:
 | | Entry point VIP:       0x1000
 | | Stack pointer:         0x0
 | | Already visited?:      N
 | | ------------------------
 | | 0000: [ PSEUDO ]     +0x0     jsq      $flags@11:1  0x2000       0x3000
 | | | | | | Entry point VIP:       0x2000
 | | | | | | Stack pointer:         0x0
 | | | | | | Already visited?:      N
 | | | | | | ------------------------
 | | | | | | 0000: [ PSEUDO ]     +0x0     jmpq     0x4000
 | | | | | | | | | | Entry point VIP:       0x4000
 | | | | | | | | | | Stack pointer:         0x0
 | | | | | | | | | | Already visited?:      N
 | | | | | | | | | | ------------------------
 | | | | | | | | | | 0000: [ PSEUDO ]     +0x0     vexitq   0x0
 | | | | | | Entry point VIP:       0x3000
 | | | | | | Stack pointer:         0x0
 | | | | | | Already visited?:      N
 | | | | | | ------------------------
 | | | | | | 0000: [ PSEUDO ]     +0x0     jmpq     0x4000
 | | | | | | | | | | Entry point VIP:       0x4000
 | | | | | | | | | | Stack pointer:         0x0
 | | | | | | | | | | Already visited?:      Y
 | | | | | | | | | | ------------------------
After:
 | | Entry point VIP:       0x1000
 | | Stack pointer:         0x0
 | | Already visited?:      N
 | | ------------------------
 | | 0000: [ PSEUDO ]     +0x0     jmpq     0x4000
 | | | | | | Entry point VIP:       0x4000
 | | | | | | Stack pointer:         0x0
 | | | | | | Already visited?:      N
 | | | | | | ------------------------
 | | | | | | 0000: [ PSEUDO ]     +0x0     vexitq   0x0
 | | | | | | Entry point VIP:       0x4000
 | | | | | | Stack pointer:         0x0
 | | | | | | Already visited?:      Y
 | | | | | | ------------------------
wallds commented 2 years ago

image image2 image3

z3t4s commented 2 years ago

Yup, didn't think of that.

(+ins)->base = &ins::jmp;
(+ins)->operands.resize(1);
(+ins)->operands[0] = { new_vip.ival, arch::bit_count };

prev->next.resize(1);
Tai7sy commented 2 years ago

It seems some output of test cases are different with the expected result. especially in stack optimization

mrexodia commented 2 years ago

It seems some output of test cases are different with the expected result. especially in stack optimization

If the changes are an improvement I guess the tests should be updated to reflect this.

Tai7sy commented 2 years ago

This test case failed, the push on block1 should not be optimized

https://github.com/vtil-project/VTIL-Core/blob/cda7694f912920919b11c8e15cd62c76c7dc288b/VTIL-Tests/dummy.cpp#L419-L470

https://github.com/vtil-project/VTIL-Core/runs/4816356027?check_suite_focus=true#step:7:275

Tai7sy commented 2 years ago

And the simple loop-push demo also produce unexpected result, seems some stack accesses were incorrectly discarded.

https://github.com/vtil-project/VTIL-Core/blob/cda7694f912920919b11c8e15cd62c76c7dc288b/VTIL-Tests/dummy.cpp#L484-L495

https://github.com/vtil-project/VTIL-Core/runs/4816356027?check_suite_focus=true#step:7:350

wallds commented 2 years ago

image https://github.com/vtil-project/VTIL-Core/blob/32ee99ee8d96646942cd2a6776bf969030306886/VTIL-Compiler/common/apply_all.hpp#L59-L73

The bblock_thunk_removal_pass should be put in the collective_propagation_pass or somewhere else.

wallds commented 2 years ago
    vtil::logger::log("\n\n>> %s \n", __FUNCTION__);
    auto block = vtil::basic_block::begin(0x1000);

    block->push((uintptr_t)0x12345678);
    block->shift_sp(-0x100);
    block->js(registers::cx, 0x2000ull, 0x3000ull);
    auto block2 = block->fork(0x2000ull);
    {
        block2->shift_sp(0x100);
        block2->jmp(0x4000ull);
        block2->fork(0x4000ull);
    }
    auto block3 = block->fork(0x3000ull);
    {
        block3->shift_sp(0x100);
        block3->jmp(0x4000ull);
        block3->fork(0x4000ull);
    }
    auto block4 = block->owner->get_block(0x4000ull);
    {
        block4->pop(registers::ax);
        block4->vexit(0ull);
    }

    vtil::optimizer::stack_pinning_pass{}(block->owner);
    vtil::debug::dump(block->owner);
    vtil::tracer tracer;

    auto exp = tracer.rtrace_p({ std::prev(block4->end()), vtil::register_desc(registers::ax) }).simplify(true);
    vtil::logger::log("%s\n", exp);
    vtil::optimizer::bblock_thunk_removal_pass{}(block->owner);
    vtil::debug::dump(block->owner);

    vtil::tracer tracer2;
    auto exp2 = tracer2.rtrace_p({ std::prev(block4->end()), vtil::register_desc(registers::ax) }).simplify(true);
    vtil::logger::log("%s\n", exp2);

image

z3t4s commented 2 years ago

I added a "fix" for your issue @wallds. The blocks you are creating in your testcase couldn't exist in real world applications to my knowledge. This pass only targets single instruction, branching blocks. None of the current branching instructions modify the stack in any way. If more than one instruction (for example a push/pop) remains inside that block, it is not touched by this pass.

The ->shift_sp instruction doesn't increment the block size, that's why the check failed before. I added a modified version of your testcase though, verifying the tracer result is a good idea!

I also added an exhaust_pass in the collective_propagation_pass containing all 3 block altering passes

exhaust_pass<
branch_correction_pass,
bblock_extension_pass,
bblock_thunk_removal_pass
>
mrexodia commented 2 years ago

Thank you!