iximeow / yaxpeax-x86

x86 decoders for the yaxpeax project
BSD Zero Clause License
129 stars 23 forks source link

Add ability to query whether a memory operand is read or write #25

Open marti4d opened 2 years ago

marti4d commented 2 years ago

We have an issue in Firefox on Mac OS X where we get EXC_BAD_ACCESS for bad memory access instead of EXCEPTION_ACCESS_VIOLATION_READ/EXCEPTION_ACCESS_VIOLATION_WRITE.

We are currently using yaxpeax to disassemble the crashing instruction, and we are able to determine the crashing memory access even when it is misreported; however, we aren't able to determine the direction of the access.

It would be helpful for FF devs to be able to report the access direction on Mac. Is there any way to do this that I'm perhaps missing? Is it a feature that could possibly be added?

Thanks ixi!

iximeow commented 1 year ago

hey that's great! that you can use the crate to disassemble the crashing instruction, anyway :)

the short answer is, unfortunately, there's no missing feature in yaxpeax-x86 to get access directions for operands. i've written out some of this over in yaxpeax-core but realized i wanted to take a somewhat different approach and have gotten buried a few levels deep in yak shave ever since.

the longer answer is that at some point i think operand directions should be something that are in yaxpeax-x86 (and -arm, -mips, ... etc), but i'd want to settle on a useful interface to describe this kind of information first. for cases like x86, it's probably reasonable to encode operand access directions as a LUT keyed on the opcode, but at the point that this crate describes operand directions i think it should also make implicit operands - especially eflags, or better, individual flag bits - also queryable.

the "implicit operand" situation is also important for instructions like push [rax], where the instruction simultaneously reads and writes from memory. for code analysis and attributing crashes you need to be able to know about both!!

over in yaxpeax-core, the approach that i think will work well enough to land in these arch-specific crates is kind of like this control flow analysis definition. there's a hilariously incomplete semantic for x86_64 alongside, and an impl of DFG (terribly wrong name) is able to act as a visitor to collect information you care about. so you might imagine an impl DFG<OperandAccessInfo, x86_64, ()> for OperandAccessAnalysis<x86_64> where read_loc and write_loc, rather than being backed by a concrete set of u64 registers, or symbolic values, are implemented to just collect reads and writes to memory and store those with the operand that had such an access.

that's the way i think gives a scalable and verifiable way to get this information for the dozen or so different architectures i'd really like this information to be available for.

but i need to admit that the last time i was looking at this in yaxpeax-core was a year ago, and in the mean time something simple to ask "is this yaxpeax_x86::long_mode::Operand read or written" is probably sufficient for everyone who isn't me, and gets you something usable before another year passes. there's probably something to be done one-off for yaxpeax-x86 that's compatible with these future goals.

at minimum, this kind of information would have to be computed after decoding, when needed. i can't think of cases off the top of my head where Instruction wouldn't have sufficient information to correctly determine operand read/write behavior, so i thiiiink it could work out ok!