zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.47k stars 438 forks source link

rol ror encoding immediate 8 bit operand problem #526

Closed qcold closed 2 months ago

qcold commented 2 months ago

I'm trying to encode sbb al, 0xdc, and I need to pass request.operands[1].imm.u = 0xffffffffffffffdc. When I pass the same value (second operand request.operands[1].imm.u = 0xffffffffffffffdc) for encoding rol/ror word ptr ss:[esp+0x8], 0xdc, I get an error stating that it's impossible to encode the instruction.

mappzor commented 2 months ago

Immediate operand for rcl/rcr/rol/ror is considered unsigned. From Intel's SDM:

the count operand is an unsigned integer that can be an immediate or a value in the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W = 1).

So your 0xDC is actually 0x1C.

qcold commented 2 months ago

Should I make exceptions for these instructions in my code?

        case ZYDIS_OPERAND_TYPE_IMMEDIATE: {
            req.operands[i].imm.u = op.imm();

            uint64_t mask = I.operands()[0].size() == 8 ? 0x3f : 0x1f;

            switch (req.mnemonic) {
            case ZYDIS_MNEMONIC_ROL:
            case ZYDIS_MNEMONIC_ROR:
            case ZYDIS_MNEMONIC_RCL:
            case ZYDIS_MNEMONIC_RCR:
                req.operands[i].imm.u &= mask;
            }

            break;
        }
mappzor commented 2 months ago

It's up to you if you want to sanitize. As far as encoder is concerned 8-bit unsigned value is the only requirement (so 0xDC is fine but 0xFFFFFFFFFFFFFFDC is not).

Please note there are other instructions with unsigned immediates (and even similar masking semantics), bit shifts and bit manipulation instructions are the most notable examples.

qcold commented 2 months ago

I will choose your approach, but should I still write this as exceptions, or is there a function to handle such situations (I couldn't find one)?

        case ZYDIS_OPERAND_TYPE_IMMEDIATE: {
            req.operands[i].imm.u = op.imm();

            switch (req.mnemonic) {
            case ZYDIS_MNEMONIC_SALC:
            case ZYDIS_MNEMONIC_SAR:
            case ZYDIS_MNEMONIC_SHL:
            case ZYDIS_MNEMONIC_SHR:
            case ZYDIS_MNEMONIC_ROL:
            case ZYDIS_MNEMONIC_ROR:
            case ZYDIS_MNEMONIC_RCL:
            case ZYDIS_MNEMONIC_RCR:
                req.operands[i].imm.u = static_cast<uint8_t>(req.operands[i].imm.u);
            }

            break;
        }

Does it look ugly or not?

mappzor commented 2 months ago

It's certainly error-prone as you would need to account for every instruction in this way. You have already made an error by including salc which doesn't accept any operands. You can see where this leads...

I don't know what you are doing and why it's possible for you to end up with values with wrong signedness. You should probably focus on that underlying issue and resolve it instead of trying to save yourself with hacks. Proper solution will be specific to your own project, so it's out of scope of this discussion. As far as encoder is concerned there are no issues here to be resolved.

qcold commented 2 months ago

I have my own instruction emulator, and I'm performing optimization passes. Here are two cases I have.

sub eax, ecx (0x00000000deadc0de) -> sub eax, 0xdeadc0de: Here, I need to sign-extend it (to 0xffffffffdeadcode) so that I get the encoded instruction.

rol eax, cl (0x00000000deadc0de) -> rol eax, 0xde: In this case, I take the first byte (0xde) and do the same as before (sign-extend it to 0xffffffffffffffde), but this results in an error from the encoder because I need it to be unsigned, i.e., 0x00000000000000de.

It seems like I can't avoid a workaround in this situation. I'm sorry to burden you with this, but it's important for me to hear your opinion.

mappzor commented 2 months ago

Generally anything that involves emulation/optimization requires understanding the semantics of instructions, so case by case handling is required most of the time anyway.

In some cases you might be able to get some knowledge from disassembling original instruction e.g.

== [ OPERANDS ] ============================================================================================
##       TYPE  VISIBILITY  ACTION      ENCODING   SIZE  NELEM  ELEMSZ  ELEMTYPE                        VALUE
--  ---------  ----------  ------  ------------   ----  -----  ------  --------  ---------------------------
 0   REGISTER    EXPLICIT      RW      MODRM_RM     32      1      32       INT                          eax
 1   REGISTER    IMPLICIT       R          NONE      8      1       8      UINT                           cl
 2   REGISTER      HIDDEN      CW          NONE     64     64       1       INT                       rflags
--  ---------  ----------  ------  ------------   ----  -----  ------  --------  ---------------------------

As you can see in this ZydisInfo output cl is denoted as UINT. However once you start replacing instructions with completely different ones or run into special cases this becomes unfeasible. Handling special cases one by one is probably the way to go.