1193 only changes vmayarascan? It doesn't alter normal yarascan because you might just as well dump the various files and run yara over them? It should still be scanning through (what I hope are 16 Mb chunks, rather than 4k chunks) the whole of memory chunked up the way the scanner is supposed for contiguous regions. If not then I'll have to look into the scanner code itself. It's not really supposed to be breaking on pages, it's supposed to clump them together into 16 Mb blocks if it can...

ikelos commented 4 months ago

Also, we test then we merge. Much better than merging and breaking... 5;P It's kinda the whole reason for making these PRs in the first place... 5;D

ikelos commented 4 months ago

Oh vmayarascan is the linux one? Yeah, sorry, I forgot we had a linux specific one now. Yep, we can update that too...

ikelos commented 4 months ago

Oh, hmmm, it looks like even though it should be combining a bunch of the data to make a chunk, it's not necessarily doing that... 5:S Ugh, not an area of the code I really want to wade into, but it's definfinitely not working the way it was supposed to (and therefore getting pretty ugly and inefficient in the process). 5:S

eve-mem commented 4 months ago

Also, we test then we merge. Much better than merging and breaking... 5;P It's kinda the whole reason for making these PRs in the first place... 5;D

... Yes totally! I need to work on my wording there :D

Maybe I've misunderstood the yarascanning bits, i only did some very basic tests which is quite amateur of me... Let me come back with a solid example and test case for us.

ikelos commented 4 months ago

Maybe I've misunderstood the yarascanning bits, i only did some very basic tests which is quite amateur of me... Let me come back with a solid example and test case for us.

No, I just need to look into the scanning mechanism and make sure it chunks together suitable sized pieces.

ikelos commented 4 months ago

Ok, having taken a few hours to dig into this, I believe the scanner is doing the correct thing, which is taking a segment of memory, pulling out all the mapped chunks and scanning each one chunk by chunk where the chunks are not contiguous. This doesn't fit with how people expect yara to work, because they assume that they can just read the entire vma and anywhere that isn't mapped will just come up as 0. So I'm not sure whether/how to fix yarascan? Are people's expectations incorrect, or should yara be scanning pages padded with 0s when there's no mapped pages in the middle? The idea behind the scanner chunking is to make sure scanning processes are reading through reams and reams of pointless 0s. Since yara is C code under the hood, and we're pretty efficient at padding things, perhaps we should just read the whole layer and take the memory hit.

One other option would be to build an file-like class that reads from the layer as much/as little as yara asks for, letting it handle the buffering. I'll probably try and mock that up later, but for now yarascan (and more importantly, the scanning framework) are staying as they are at the moment.

volatilityfoundation / volatility3

Update vmayarascan with full vma scan #1195