intelxed / xed

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions
https://intelxed.github.io/
Apache License 2.0
1.41k stars 148 forks source link

xed_get_largest_enclosing_register32() Returns Misleading Values #138

Open ElykDeer opened 6 years ago

ElykDeer commented 6 years ago

The Problem

As it stands, if I run:

xed_get_largest_enclosing_register32(XED_REG_RAX)

I would not expect it to return XED_REG_RAX, as it currently does.

This stands for all register values that cannot be contained in a 32-bit register, including segment registers.

Proposed Solution

For registers that are larger than 32 bits, I might expect xed_get_largest_enclosing_register32 to return the 32-bit version of the register, though to do that would violate the "enclosing" clause of that call, since the 32-bit register does not enclose the 64-bit one.

A better solution for all registers that do not have a 32 bit version, which would include segment registers, might be to return XED_REG_INVALID.

Counter Problem

The proposed solution would violate the documentation of both xed_get_largest_enclosing_register32 and xed_get_largest_enclosing_register, where they state "64b mode" and "32b mode" [emphasis mine], which I would read to indicate as the "age" of a register as opposed to the size of the register.

Counter Solution

Where as

xed_get_largest_enclosing_register32(XED_REG_R15B)

currently returns XED_REG_R15D, I propose that it should instead return XED_REG_INVALID, as in 32-bit mode this register does not exist. Then for segment registers, the function call can still return the same value since that is what it would be in 32-bit mode.

The naming convention for this function then leads to indicate that it returns "The largest 32-bit mode register that encapsulates or otherwise represents the desired register."

And just for completeness...

This solution would obviously also need to apply to the rest of registers with this problem that I didn't specifically mention by name.

These changes might also require a new set of functions that perform the action of "getting the version of register_x with size_x" - ie, xed_get_register_of_width32(XED_REG_R15B) would return XED_REG_15D - to support the behavior that xed_get_largest_enclosing_register32 currently does, as well as add support for decreasing the size of a register - xed_get_register_of_width32(XED_REG_RAX) would return XED_REG_EAX.

While I'm At It

It would be nice if there were also a xed_get_largest_enclosing_register16, following the same structure I suggest above.

Also:

xed_get_largest_enclosing_register 64b mode
xed_get_largest_enclosing_register32 32b mode
xed_get_register_width_bits 32b mode
xed_get_register_width_bits64 64b mode

???

markcharney commented 6 years ago

i hear what you are suggesting. I'll think about it. I do not mind changing the 32 function to barf on 64b registers as input. And I don't mind making the 32 functions not accept regs that are not valid in 32b mode.

markcharney commented 6 years ago

(and sorry about accidental closing of the issue. I was looking to label the issue "acknowledged" using the mobile/phone interface to github and failed miserably.)

markcharney commented 6 years ago

I fixed the first part locally. Not sure what I want to do about the 2nd part (largest enclosing reg for R15B in 32b mode). One option would be to just have you validate the GPR names for the mode and leave the API as-is.

Also curious: why do you want the largest-enclosing 16b register? The latter were really for register allocators.

ElykDeer commented 6 years ago

For the second part what gets me is the use of the word "mode".. Logically I think there should be a way to query for each of a register's mutation (all the way from 512+bits to 8 bits), but there should also be a way to query for registers that are valid per-processor design (16, 32, 64, eventually 128 I imagine).

While you've fixed the actual "bug", it'd still be nice to have the ability I just mentioned. As it stands, I have no way of programmatically determining if a register exists in 16 or 32 bit mode, though I think I can mostly figure out a registers different sizes with the GPR__ categories (haven't tried).

For the mode issue, there's still the matter of misleading documentation, though you might be able to fix it with a simple yet clever re-arrangement of the register enumeration (though perhaps not as maintainable for future additions to the architecture).

To directly answer your question, I just want to be able to programmatically determine what registers would-have-existed for a given x-bits processor. Whether it's done in one of the ways I've mentioned, an additional category for them, or something entirely different I think it would be a lot cleaner to "ask XED" to give me the information rather than hard-coding the handful of registers in my program (albeit, not hard, just a lot nicer).