toolCHAINZ / jingle

SMT Modeling for Ghidra's PCODE
MIT License
8 stars 1 forks source link

SleighContextBuild set_image with &[u8] slice without copy #7

Open chf0x opened 3 weeks ago

chf0x commented 3 weeks ago

Hi!

I was wondering if there's a specific reason why Image works with Vec instead of using &[u8] directly. In my use case, I provide multiple slices to the lifter to obtain the corresponding pcode, but since Image is copying slice values to vector, I'm experiencing a significant performance hit.

Additionally, would it be possible for functions like pub fn set_image(mut self, img: Image) -> Self to take &mut self instead?

I suppose my solution would be to fork and implement set_image directly on &[u8], but I wanted to ask first if there's a particular reason for the current implementation.

Thank you!

toolCHAINZ commented 3 weeks ago

Hey! I assume you're referring to the From<Vec<u8>> impl for Image? I'd have to re-inspect the code to verify, but I suspect the reason was just to avoid having another lifetime argument flying around: I envisioned Image as a structure that gets made in Rust and then moved in to C++-land, given to sleigh, and forgotten about. So that's the reason for it owning the data internally.

Other than the ergonomics of dealing with more lifetimes, there's no reason why Image NEEDS to own the data I think. Would have to try it out to be sure. Looking at my C++ code, I think there's additional allocations happening in C++ too for the move constructor... I'll think about this, but it will be a couple weeks before I have time to consider implementing larger changes.

On the set_image mut thing, absolutely. This is the first time I've tried the "builder" pattern and naively chose the wrong form of it when I wrote that. It should probably get changed to &mut as you suggest.

Regarding the new impl, you could implement for &[u8] as you suggest, but that would still involve a clone/allocation to produce the owned data Image requires.

The best performance fix would be to have Image and the C++ DummyLoadImage hold a reference instead of the data itself and slap a lifetime onto SleighContext. A middleground might be to pass a reference to C++ but still have it use a copy constructor to make a C++-owned version. That would at least eliminate one layer of extra allocations.

chf0x commented 3 weeks ago

Yes, I meant this https://github.com/toolCHAINZ/jingle/blob/main/jingle_sleigh/src/ffi/image.rs#L12, would love to have instead

    #[derive(Debug, Clone)]
    pub struct ImageSection {
        pub(crate) data: &[u8],
        pub(crate) base_address: usize,
        pub(crate) perms: Perms,
    }

and pass a reference to C++ code

P.S: I think there is similar issue to set_image with build: https://github.com/toolCHAINZ/jingle/blob/main/jingle_sleigh/src/context/builder/mod.rs#L31

toolCHAINZ commented 3 weeks ago

Haha, yeah, and there's the fun, because this is not legal rust:

    #[derive(Debug, Clone)]
    pub struct ImageSection {
        pub(crate) data: &[u8],
        pub(crate) base_address: usize,
        pub(crate) perms: Perms,
    }

It would have to be:

    #[derive(Debug, Clone)]
    pub struct ImageSection<'a> {
        pub(crate) data: &'a [u8],
        pub(crate) base_address: usize,
        pub(crate) perms: Perms,
    }

Which would then mean that SleighContext becomes SleighContext<'a> (and might even require messing around with Pin) which could get fun. I'm not opposed to the idea of sending references to slay instead of values, it would definitely be more efficient, but I want to think through the best way to do it first.

It's possible there's another source of the performance hit you're getting too: for every new Image you're making, you're having to make a new SleighContext too. This involves sleigh parsing the architecture definition fresh every time because I'm not caching any of that.

I've also internally gated the sleigh initialization function behind a mutex:

https://github.com/toolCHAINZ/jingle/blob/a17d75ce9d3f1c2a37d26335e329d69c7d135261/jingle_sleigh/src/context/mod.rs#L80-L90

This is necessary because the routine that parses the sleigh definition is VERY not thread safe. So if you're trying to make lots of sleigh contexts within threads in the same process, you're going to get bottlenecked severely during context creation.

Maybe in some future iteration of this we allow re-initializing an existing context with a new image, and pass images by reference, but that will probably require me understanding how sleigh uses LoadImages a bit better. I took the current approach because it was very obviously sound even if inefficient.

toolCHAINZ commented 3 weeks ago

Oh yeah, and definitely agreed that the build issue should be fixed too. If you're feeling trying your hand at doing some of these changes, I'm happy to review PRs.

chf0x commented 3 weeks ago

Sure, there should definitely be a lifetime specified. The code I provided was more of a 'pseudo-Rust' example, just to convey the concept. I’ll be happy to submit a PR if I get something working. The only question is when - hopefully within a week or two, depends on time availability

chf0x commented 3 weeks ago

...you're having to make a new SleighContext too. This involves sleigh parsing the architecture definition...

I kind of solved it by cloning. Not an optimal solution, but better than parse every time:

         ctx_builder: SleighContextBuilder::load_folder(settings.ghidra_sla_folder).unwrap(),

And after I just do:

        let ctx = self
            .ctx_builder.clone()
            .set_image(Image::from(bytes))
            .build(&self.arch)
            .unwrap();
toolCHAINZ commented 3 weeks ago

So, yes there's the parsing occuring there but I'm actually referring to parsing on the C++ side. We do all that rust-based parsing to enumerate what architectures are available, how to configure them, and to derive the path for the .sla file. The .sla file is the actual machine-generated specification that sleigh reads. All our parsing is just so we know what path to send to makeContext.

makeContext, which is over in C++, makes a sleigh engine and tells it to parse the provided .sla file to initialize itself. It's this parsing, internal to sleigh, that takes a while. Decompressed, those files have multiple megabytes of text to parse, so it understandably takes a bit.

So in the future, we can maybe better serve your use-case by enabling mutation of a sleigh context and its Image to avoid that very expensive initialization step. I suspect that it's likely to be responsible for your performance issues.