zyantific / zydis

Fast and lightweight x86/x86-64 disassembler and code generation library
https://zydis.re
MIT License
3.44k stars 438 forks source link

Add more status codes to the encoder #265

Open ZehMatt opened 2 years ago

ZehMatt commented 2 years ago

After some testing and messing around I noticed that the encoder doesn't provide a lot info when it comes to failures. Example 1:

ZydisEncoderRequest req{};
req.mnemonic = ZYDIS_MNEMONIC_LEA;
req.operand_count = 2;
req.operands[0].type = ZYDIS_OPERAND_TYPE_REGISTER;
req.operands[0].reg.value = ZYDIS_REGISTER_RAX;
req.operands[1].type = ZYDIS_OPERAND_TYPE_MEMORY;
req.operands[1].mem.base = ZYDIS_REGISTER_RIP;
req.operands[1].mem.displacement = 0x1337;

Because the size is not specified on the memory operand it will result ZYDIS_STATUS_IMPOSSIBLE_INSTRUCTION, a better result would be something like "Invalid operand size" in this case.

Example 2:

ZydisEncoderRequest req{};
req.mnemonic = ZYDIS_MNEMONIC_JMP;
req.branch_type = ZydisEncodableBranchType::ZYDIS_ENCODABLE_BRANCH_TYPE_NONE;
req.operand_count = 1;
req.operands[0].type = ZYDIS_OPERAND_TYPE_IMMEDIATE;
req.operands[0].imm.u = 0x12;

Not assigning a branch type also leads to ZYDIS_STATUS_IMPOSSIBLE_INSTRUCTION

Example 3:

ZydisEncoderRequest req{};
req.mnemonic = ZYDIS_MNEMONIC_JMP;
req.branch_type = ZydisEncodableBranchType::ZYDIS_ENCODABLE_BRANCH_TYPE_SHORT;
req.operand_count = 1;
req.operands[0].type = ZYDIS_OPERAND_TYPE_IMMEDIATE;
req.operands[0].imm.u = 0xFFFFFFF;

Using immediate value outside the possible branch type range.

And so on.

mappzor commented 2 years ago

This might seem to be an obvious thing to improve, so let me explain why it wasn't done in the first place. The problem I faced is that while it's straightforward to reject a single variant for a specific, well-defined reason, things become tricky when you are rejecting multiple variants (and that's the case with almost every instruction).

How to find error that's closest to the root cause? Initially I've considered having a set of error codes forming a pre-defined precedence hierarchy e.g. invalid operand type could be overwritten by invalid register for sib addressing because the latter is more specific. However I suspected that getting such hierarchy to work well for whole ISA could be very tricky. You have to carefully fine-tune two variables here: precedence and granularity of error codes. Also I had no idea how to test that for correctness.

That's why currently encoder has a very simplistic system with 2 error codes only (I omit buffer-related stuff):

Probably now when whole encoder is done, it's worth to revisit my original idea. Even if not perfect, it might be possible to find a reasonable balance between correctness and granularity of error codes. Also my solution wouldn't have a significant impact on performance. If anyone has other ideas please let me know.