Open luser opened 3 years ago
so! i wanted to give this issue some decent thought, because all the obvious designs seem in tension with various use-cases.
first, i entirely agree that this crate shouldn't leave so much bookkeeping up to users of the library. for reporting an instruction's Address
, or getting its bytes out, some users (me!) really don't need those for the most part. i'm usually tracking instruction positions with some other mechanism, if at all. similar for instruction bytes. so i wouldn't want to obligate Reader
implementations into tracking that data - even if rustc could eliminate the dead code, it will negatively impact inliner heuristics as i learned adding decoder annotations to the x86 decoder.
so, what i'm thinking is, both "i would like this range of bytes" and "i would like this Address
" can be implemented independently of a specific Reader
impl. the general shape could be like...
impl<A: Arch> Decoder<A> {
fn report_context(&mut self) -> DecoderWithContext<Self> {
DecoderWithContext { decoder: self, base: None }
}
struct DecoderWithContext<Underlying: Arch> {
decoder: &Decoder<Underlying>,
buffer: Vec<Underlying::Word>,
base: Option<Underlying::Address>,
base: Underlying::Address,
}
impl<Underlying: Arch> DecoderWithContext<Underlying> {
fn with_base_address(&mut self, base: Option<Underlying::Address>) {
self.base = base;
}
fn words(&self) -> &[A::Word] {
self.buffer.as_slice()
}
}
impl<A: Arch> Decoder for DecoderWithContext<A> {
fn decode_into<T: Reader<A::Address, A::Word>>(&self, inst: &mut A::Instruction, words: &mut T) -> Result<(), A::DecodeError> {
self.addr = self.base + idk_some_running_offset;
self.decoder.decode_into(inst, &mut BufferingReader::new(words, &mut self.buffer))
}
}
with the general idea that if you want extra bookkeeping, you can opt in, but it can be implemented generically in yaxpeax-arch
for all Reader
and Decoder
. the interesting part here is that the decoder will need to grow a Vec<Word>
to buffer instructions. i'm not sure if this should decode an InstructionGuard
that keeps a &'instr [u8]
to access the buffer, or if that's leaning too much on lifetimes.
anyway, the same thing extends to get instruction addresses, as an A::Address
either on DecoderWithContext
or the returned InstructionGuard
. it does seem likely that if you want the underlying words, keeping the address isn't really much extra overhead, and if you want addresses but not the instruction's words and Vec<Underlying::Word>
for some reason is too much overhead... maybe you just gotta do the bookkeeping yourself.
on the topic of "knows when it runs out of input", you can use DecodeError::data_exhausted()
for that - it's a bug if a decoder returns an error that does not return true for this function if the decode failed because there was insufficient data. i need to make this more bold and highlighted and sparkly in implementation guidance.
and finally, for an Iterator
, there's something for that over in yaxpeax_core
. Reader
didn't exist at the time but now i think that can probably get moved over to yaxpeax_arch
, yes.
what do you think of the DecoderWithContext
direction for this ancillary data?
I swapped out
capstone
foryaxpeax
in mydisasm
crate and got it working, but theReader
API is a little clunky right now. Here's the code in question: https://github.com/luser/rust-disasm/blob/c7ffc6fddcbfabe01c96682cd8933331af83e4f5/src/lib.rs#L352Given that all addresses have to be
Arch::Address
, doing the address accounting myself is kind of annoying. I would love it ifU8Reader
took abase_address
parameter and could tell me what the current address was for the instruction that was just decoded.Additionally, given that I'm feeding it an
&[u8]
,U8Reader
really ought to be able to know when it runs out of input instead of requiring me to check the address range and break.For my use case (printing disassembly), I want both the stringified instruction + the bytes it was derived from. If
decode
could give me the bytes so I don't have to index the slice myself that would be lovely.Finally, it feels like an
Iterator
implementation could be wrapped around this to make it simpler to use for the general case, even if you preserve the existing lower-level API for more complex use cases.Thanks for writing these great crates!