keystone-engine / keystone

Keystone assembler framework: Core (Arm, Arm64, Hexagon, Mips, PowerPC, Sparc, SystemZ & X86) + bindings
http://www.keystone-engine.org
GNU General Public License v2.0
2.3k stars 458 forks source link

Fix symbol resolver issues #562

Open endofunky opened 1 year ago

endofunky commented 1 year ago

I've been having some frustrating hours of woes with the sym_resolver - mainly it forcing a radix change and symbol offsets to be calculated correctly.

This is mostly based on suggestions from other issues and a cherry-picked commit from another PR. I've added a few more examples to bindings/python/sample.py to illustrate it seemingly behaving correctly now. I've also added regression tests for anything mentioned below. All the examples in that file now also run for me again - on master they appear broken. Details below.

1. Invalid symbol offsets

Issues: #244 #271 #351

Given a symbol table:

SYM   ADDR
_l1   0x1000

On current master with base 0x1000:

jmp _l1
nop

Assembles to eb ff 90. That offset is incorrect and looks like it's calculated from the start of the instruction rather than the end of it, missing the offset for it.

The suggestion from #351 fixes that and symbol offsets now appear to be calculated correctly. With the suggested change the instructions correctly assemble to eb fe 90.

I've fixed the Python sym_resolver examples. The x64_sym_resolver.py regression test, too, which also passes.

2. Radix set from default (10) to 16 when setting sym_resolver

Issues: #481 #436 #538

Cherry-picked 5c7ed87cc86254c99db57ddf14584b0561b5bf6c from #528

Another issue is present in the setting of ks_option:

KEYSTONE_EXPORT
ks_err ks_option(ks_engine *ks, ks_opt_type type, size_t value)
{
    ks->MAI->setRadix(16);
    switch(type) {
        case KS_OPT_SYNTAX:
            if (ks->arch != KS_ARCH_X86)
                return KS_ERR_OPT_INVALID;
            switch(value) {
                default:
                    return KS_ERR_OPT_INVALID;
                case KS_OPT_SYNTAX_RADIX16: // default syntax is Intel
                case KS_OPT_SYNTAX_NASM | KS_OPT_SYNTAX_RADIX16:
                case KS_OPT_SYNTAX_INTEL | KS_OPT_SYNTAX_RADIX16:
                    ks->MAI->setRadix(16);
                case KS_OPT_SYNTAX_NASM:
                case KS_OPT_SYNTAX_INTEL:
                    ks->syntax = (ks_opt_value)value;
                    ks->MAI->setAssemblerDialect(1);
                    break;
                case KS_OPT_SYNTAX_GAS | KS_OPT_SYNTAX_RADIX16:
                case KS_OPT_SYNTAX_ATT | KS_OPT_SYNTAX_RADIX16:
                    ks->MAI->setRadix(16);
                case KS_OPT_SYNTAX_GAS:
                case KS_OPT_SYNTAX_ATT:
                    ks->syntax = (ks_opt_value)value;
                    ks->MAI->setAssemblerDialect(0);
                    break;
            }

            return KS_ERR_OK;
        case KS_OPT_SYM_RESOLVER:
            ks->sym_resolver = (ks_sym_resolver)value;
            return KS_ERR_OK;
    }

    return KS_ERR_OPT_INVALID;
}

This currently forces a radix of 16 every time any ks_option call is made. So when a symbol resolver is set without explicitly setting a radix of 16, one still ends up with a radix of 16.

This change for this was made for #382 to address https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=10437 but unless ks_option is called at least once, the issue still remains.

This is now also now covered by a new regression test.

In #382 was also a short discussion if the default should be 16 or 10 and it was set to 16. This also appears to be incorrect because the actual default in keystone unless explicitly changed is a radix of 10, which is also NASM's default.

Notes

I hope we can get this (or a similar fix) merged a new version released. I'm happy to write additional regression tests or make any necessary needed. My current project has some code for incremental multi-pass assembling on top of keystone where I handle the symbol table myself and this is currently blocking my work :(

endofunky commented 1 year ago

I can't see what's failing on the semaphoreci CI step. The appveyor task is failing on master with the same error. So I assume they're both the same failures that are occurring on master as well.