unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.49k stars 1.33k forks source link

Some of the x86 registers Unicorn enumerates don't really exist #1440

Closed elicn closed 2 years ago

elicn commented 3 years ago

Available control registers should be CR0-CR4, CR8. The following registers do not exist:

See:

Available debug registers should be DR0-DR7 (though DR4 and DR5 are not available, they are defined by the arch) The following registers do not exist:

See:

The following registers do not seem to exist

wtdcode commented 3 years ago

Thanks. Would have a look ASAP.


From: Eli @.> Sent: Thursday, September 2, 2021 12:07:05 AM To: unicorn-engine/unicorn @.> Cc: Subscribed @.***> Subject: [unicorn-engine/unicorn] Some of the x86 registers Unicorn enumerates don't really exist (#1440)

Available control registers should be CR0-CR4, CR8. The following registers do not exist:

See:

Available debug registers should be DR0-DR7 (though DR4 and DR5 are not available, they are defined by the arch) The following registers do not exist:

See:

The following registers do not seem to exist

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/unicorn-engine/unicorn/issues/1440, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJULO4YXOVDSYECLIXN2VLT7ZFSTANCNFSM5DG2O6BQ.

wtdcode commented 2 years ago

Link to #1449

wtdcode commented 2 years ago

I removed some useless registers as you stated, but for FP[x] they are x87 registers.

elicn commented 2 years ago

x87 doesn't have registers named FP. It does have ST0..ST7 and R0..R7, perhaps you are referring to the latter..? If so, I believe that naming those registers FP0..FP7 would be wierd as there is no visible correlation; perhaps FP_R0..FP_R7 would be clearer..

Consider using a UC_X87_REG prefix for x87 registers, where these ones may be named UC_X87_REG_R0..UC_X87_REG_R7.

References:

wtdcode commented 2 years ago

x87 doesn't have registers named FP. It does have ST0..ST7 and R0..R7, perhaps you are referring to the latter..? If so, I believe that naming those registers FP0..FP7 would be wierd as there is no visible correlation; perhaps FP_R0..FP_R7 would be clearer..

Consider using a UC_X87_REG prefix for x87 registers, where these ones may be named UC_X87_REG_R0..UC_X87_REG_R7.

References:

Thanks for the reference. Re-open to fix later.

wtdcode commented 2 years ago

x87 doesn't have registers named FP. It does have ST0..ST7 and R0..R7, perhaps you are referring to the latter..? If so, I believe that naming those registers FP0..FP7 would be wierd as there is no visible correlation; perhaps FP_R0..FP_R7 would be clearer..

Consider using a UC_X87_REG prefix for x87 registers, where these ones may be named UC_X87_REG_R0..UC_X87_REG_R7.

References:

I double-checked that UC_X86_REG_FP[x] refers to x87 ST registers. Would you like a rename as supposed? @aquynh

elicn commented 2 years ago

I double-checked that UC_X86_REG_FP[x] refers to x87 ST registers. Would you like a rename as supposed? @aquynh

There are already registers named UC_X86_REG_ST0..UC_X86_REG_ST7. Does it mean both sets of registers refer to the same arch registers?

wtdcode commented 2 years ago

I’m afraid that’s the case. I would have a look later.


From: Eli @.> Sent: Monday, October 11, 2021 12:06:06 PM To: unicorn-engine/unicorn @.> Cc: lazymio @.>; State change @.> Subject: Re: [unicorn-engine/unicorn] Some of the x86 registers Unicorn enumerates don't really exist (#1440)

I double-checked that UC_X86_REG_FP[x] refers to x87 ST registers. Would you like a rename as supposed? @aquynhhttps://github.com/aquynh

There are already registers named UC_X86_REG_ST0..UC_X86_REG_ST7. Does it mean both sets of registers refer to the same arch registers?

― You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHubhttps://github.com/unicorn-engine/unicorn/issues/1440#issuecomment-939881192, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHJULO7TGUXEGIR5GFPVGUDUGKZI5ANCNFSM5DG2O6BQ.

aquynh commented 2 years ago

i think those FP registers can be removed.

jtorreno commented 2 years ago

For the record, referencing cr1 will also throw #UD. See Intel SDM, Vol. 2B, 4-40: MOV—Move to/from Control Registers. Sneaky one, eh?

elicn commented 2 years ago

This is indeed a tricky one. Even though CR1 cannot be accessed by software, it does exist [see Intel SDM vol. 3 chapter 2.2] and may be accessed by microcode. Unicorn may choose to hold its state (even though I am not sure it is documented anywhere visible outside Intel) and make it available for users to directly read or write in their scripts.

In the bottom line, since CR1 is technically implemented but inaccessible through emulated assembly instruction, I guess it is up to Unicorn to decide whether they keep it or not. Both approaches make sense.

jtorreno commented 2 years ago

For that matter, cr0-cr15 are all addressable. From what I can tell, cr1 only exists insofar as it's labeled reserved.

elicn commented 2 years ago

It's true that CR0-CR15 are all addressable [i.e. may be encoded as an instruction operand], but I didn't refer to that.. From what I recall CR1 does exist, but it is not accessible by software (that is, macro isntructions), as opposed to the CR7, CR9 - CR15 that are inaccessible because they do not exist.

image

From a software perspective I agree that there is not much of a difference there, but there is a thin line between "inaccessible" and "not exist". Since Unicorn is an emulator [and not a simulator] I don't see many reasons for Unicorn to keep it, but my issue was originally about non-existant registers. Sicne CR1 technically exists, I didn't list it.

All that being said, your comment does show that there should be a difference between Capstone and Unicorn on this one. While Unicorn doesn't have a reason to maintain those registers (or even define them, for that matter), Capstone still needs to be able to decode instructions that have them as their operands.

jtorreno commented 2 years ago

From what I recall CR1 does exist

Well, it's reserved, sure.  Its existence is a matter of speculation.  It's not merely inaccessible—there is a profound lack of evidence supporting the hypothesis that it takes up any more space on silicon than the rest that are specified to throw #ud.

Capstone still needs to be able to decode instructions that have them as their operands.

I can agree.  But entertain this: what exactly will Capstone disassemble it into? Architecturally speaking, referring to cr1 is ill-formed.  It wouldn't exactly be x86. 😛

elicn commented 2 years ago

I can agree. But entertain this: what exactly will Capstone disassemble it into? Architecturally speaking, referring to cr1 is ill-formed. It wouldn't exactly be x86. 😛

If these registers are addressable, they are encode-able and Capstone should be able to disassemble instructions with such funny operands. For example, mov rax, cr9 assembles to 44 0F 20 C8 even though we both agree that CR9 is hollow. Since these instructions don't make much sense, they will generate a #UD once executed. In fact, the SDM is full of wierd instructions / operands / modes combos that result in #UD for being architectural nonsense.

ethindp commented 2 years ago

CR0-15 and DR0-15 are encodable operands, regardless of whether some of them (e.g. CR1, CR5-7, CR9-15, ...) actually exist or not. Instead of "removing" these from the lib, I would encourage an agnostic approach: do not make assumptions about the existence or lack thereof of these registers. We know for a fact that cr0, cr2, cr3, cr4, and cr8 exist; and we know that dr0-3 and dr7 exist (and dr6 depending on conditions). I'd strongly discourage removing these (currently) non-existent registers. They may exist at some point in the future. Just throw a #UD on them when we encounter them.

wtdcode commented 2 years ago

Closed due to PR merged.

@ethindp Sorry I missed your comment. Currently, we skip these registers to keep compatible with former versions. If some of them are available later, we can easily add them back.