Commonize structs and enums

DrChat commented 2 years ago

A number of structures and enums are re-defined per processor mode in yaxpeax-x86, such as Opcode, ConditionCode, Instruction, Prefixes, and so on...

Could you commonize these enums and structures, not only to minimize duplication but to also allow code to be written against these common types in all three processor modes?

iximeow commented 2 years ago

this is something i've kind of struggled with; some of these are the same between modes, but some are not. x86-32 and x86-16 have pusha/popa instruction, but x86-64 does not. likewise, only 64-bit code has 64-bit general-purpose registers, so it'd always be an error for 32-bit or 16-bit instructions to decode referencing them. then, because some would be the same but some would not, i figured it's easier to draw a strong line around bitnesses than to have some items shared and some not. this is also why MemoryAccessSize is re-exported for each mode.

that all said, i don't really want to write x86 semantics more or less three times for yaxpeax-core, either. i do know it's uncommon to be switching between 16/32/64 bit modes in the same code, so i'd like to avoid "what is the current bitness" checks as much as possible. (this is another, quieter, reason i don't want to mix architecture details)

my solution has been relying on traits pretty heavily so the actual architecture-specific parts are something like

impl Semantic for Machine<yaxpeax_x86::amd64::Arch> {
    fn evaluate(&mut self, inst: &yaxpeax_x86::amd64::Instruction) -> Result<...> {
        match inst.opcode {
            Opcode::MOV => {
                self.store(
                    operand2location(inst.operands.get(0),
                    x86_64_extend(
                        inst.operands.get(0).width(),
                        self.load(operand2location(inst.operands.get(1))
                    )
                );
            },
            // ... and so on ...
        }
    }
}

and then do interesting work against Machine and its understanding of the program. i realize this is a grossly simplified example, sorry :grimacing:

i assume this is much more effort than you (or other users of the library) want to make! and it's probably not suitable for your use anyway. what are you trying to do across all processor modes?

after thinking about this a bit, i'm wondering if it's reasonable to lean into an additional "quasi-x86" as an architecture at the crate root, explicitly not any specific x86 mode but an attempt to be a union of 64-, 32-, and 16-bit modes? translation from a specific bitness to that "unified" x86 might not be free, but it'd probably be much nicer to use if you're not being very picky about specific bitnesses... that would still let people (me) who know exactly what bitness they're working with to avoid looking that up regularly too.

i509VCB commented 2 years ago

I've been working on an x86 interpreter for some time (I am using iced at the moment but it is a little annoying to work with imo so I may switch again) and I do see the benefit of explicitly isolating the protected and real mode for my use case.

Yes it is annoying to need two or three huge match statements and deconstruct some elements into lower types for each instruction implementation but I only need a subset of the ISA here.

DrChat commented 2 years ago

Hmm, those are some really good points!

I'm just trying to write a basic x86 emulator capable of running in all three modes of operation. Most x86 instructions act mostly the same across all three modes (with different specifics depending on the active mode) - so I thought of just implementing those differences through runtime checks on the mode (allowing expansive code reuse). This is what I would've done for a C-based implementation (my previous primary language) - however, I'm unsure of how to approach the problem using Rust traits instead.

It'd be nice to write instruction execution code that is generic across all three modes, maybe with what would've been the runtime checks being implemented as compile-time checks on the generic Arch parameter instead. This would also make the most sense performance-wise, as mode switches are pretty uncommon once the machine is up and running.

I'm unfamiliar though with coding in a trait-heavy environment, so I'm not too sure how to approach the problem without commonizing what can be commonized.

DrChat commented 2 years ago

After some more thought and trying out a few things - I think it'd still be useful to combine the data-centric types into one where they are mostly identical. These would still be types like Opcode/Instruction/etc. And you're right that some modes don't implement some opcodes, though I feel that mostly identical is good enough to combine the types.

Where things aren't identical, it'd be useful to define shared x86-specific traits at a root-level just as you had suggested with a "qausi-x86", just so common information can be extracted by a caller interested in generic x86 implementation details.

My idea of switching at compile-time based on the current processor mode can be accomplished with the following:

/// Trait that exposes common characteristics from all three `yaxpeax-x86` processor modes.
pub(crate) trait X86CpuArch: Arch {
    /// Get the current mode, expressed as a `ProcessorMode`.
    fn mode() -> ProcessorMode;
}

impl X86CpuArch for yaxpeax_x86::x86_16 {
    fn mode() -> ProcessorMode {
        ProcessorMode::Real
    }
}

impl X86CpuArch for yaxpeax_x86::x86_32 {
    fn mode() -> ProcessorMode {
        ProcessorMode::Protected
    }
}

impl X86CpuArch for yaxpeax_x86::x86_64 {
    fn mode() -> ProcessorMode {
        ProcessorMode::Long
    }
}

And then later on, when executing instructions:

    fn exec_pusha<A: X86CpuArch, I: X86Instruction>(&mut self, insn: &I) {
        if A::mode() == ProcessorMode::Real {
            // TODO: Push all 16-bit GPRs...
        } else if A::mode() == ProcessorMode::Protected {
            // TODO: Push all 32-bit GPRs...
        }
    }

That way, we have one implementation of the pusha/pushad instructions and the compiler can specialize the implementation of each instruction during monomorphization and eliminate runtime branches.

iximeow / yaxpeax-x86

Commonize structs and enums #19