mnemonikr / symbolic-pcode

Apache License 2.0
0 stars 0 forks source link

Undefined behavior in Sleigh implementation #79

Closed mnemonikr closed 11 months ago

mnemonikr commented 11 months ago

The C++ callback implementation creates a lot of complexities for building a safe FFI layer. Ultimately we want to be able to extract pcode from memory like the following:

let pcode = sleigh.pcode(&memory, address)

We don't want to require the sleigh object to hold a reference to memory as this would complicate the process of updating memory. Yet obviously it will need access to memory to decode the next instruction. However, we can't pass a buffer built from the memory object to sleigh because we don't know ahead of time how much memory will be required to decode the instruction (thanks variable length ISAs). So unfortunately, it needs full access to the available memory.

This is where things get a bit complicated. So behind the scenes, the C++ Sleigh object has a callback object called LoadImage. When the pcode disassembly is requested, it calls this callback to load data from memory.

  1. We hold a (possibly null) pointer to a memory object in our Rust Sleigh object. This is the WeakLoader: https://github.com/mnemonikr/pcode/blob/9694e9ba6563ce5cc328aa2d97d406a28db0516a/sla/src/sleigh.rs#L226
  2. The WeakLoader is stored in a Box whose reference is leaked for constructing the RustLoadImage object.
  3. The RustLoadImage object is itself stored in a Box whose reference is leaked for constructing the C++ Sleigh object.
  4. Both leaked pointers are reconstructed into Box using Box::from_raw and stored in the Rust Sleigh object so that when it is dropped, these will also be dropped.

https://github.com/mnemonikr/pcode/blob/9694e9ba6563ce5cc328aa2d97d406a28db0516a/sla/src/sleigh.rs#L274-L284

However, as a result the following is simultaneously true:

  1. There exists a &WeakLoader reference that is held by RustLoadImage
  2. There exists a Box<WeakLoader> held by the Rust Sleigh object.

This is a problem because the WeakLoader is modified while the (immutable) reference is held by RustLoadImage.

mnemonikr commented 11 months ago

https://cxx.rs/binding/box.html has an example where a Rust Box is passed to C++ and then back to Rust via a callback. Unfortunately the API for Ghidra is not so simple, as the memory object we would like to pass from Rust through C++ and back to Rust is not readily exposed anywhere in C++.

An alternative model would be to pass the "callback" directly to the C++ disassembly function via our C++ proxy and then update the stored callback pointer in C++ before performing the disassembly. The danger here is now the C++ callback pointer could fail to be updated in the event of an exception. So we would want to wrap it in some RAII object that managed the pointer...