[Core, plugin] : Virtual mappings dumping and caching

Abyss-W4tcher commented 3 months ago

Hi 👋,

Following an observation made while investigating WIndowsAArch64 (https://github.com/volatilityfoundation/volatility3/issues/161#issuecomment-2284556649), I noticed that multiple plugins were scanning and physically mapping the entire virtual address space :

https://github.com/volatilityfoundation/volatility3/blob/6b739f6f6cff04021004ef16d21b522eae7a9d07/volatility3/framework/interfaces/layers.py#L235-L238

to later do bytes searching on the virtual (kernel) layer directly. However, on large memory samples, or heavily populated address spaces, this process takes quite some time, and needs to be reproduced on each plugin run.

As the input (the section to scan) and output (the virtual->physical mappings) are constant for a unique memory sample, there is no need to reproduce the calculations.

This PR introduces a new plugin windows.virtmapscanner and a new framework argument --virtmap-cache-path. Here is a demo :

Classical windows.filescan run :

time vol3.py -f memory_layer.raw -c conf.json windows.filescan > /dev/null 
Formatting...0.00               PDB scanning finished                  
0,02s user 0,04s system 0% cpu 4:52,12 total

Generate the virtual mappings cache with windows.virtmapscanner, and save it to a file :

# This command can be run once when first analyzing a memory sample
time vol3.py -f memory_layer.raw --save-config conf.json -o virtmapcache_dir/ windows.virtmapscanner
Formatting...0.00               PDB scanning finished                        
Volatility 3 Framework 2.8.0
  |                                 Sections | Virtual mappings cache file output
* | [(18446673704965373952, 70368744177663)] |             virtmapcache.json.xz
0,00s user 0,05s system 0% cpu 5:17,29 total

Run the windows.filescan plugin, providing the framework with the virtual mappings cache filepath :

time vol3.py -f memory_layer.raw -c conf.json --virtmap-cache-path virtmapcache_dir/virtmapcache.json.xz windows.filescan > /dev/null
Formatting...0.00               PDB scanning finished                  
0,00s user 0,03s system 0% cpu 1:54,60 total

The time spent running windows.virtmap is already profitable ! This does not only applies to the windows.filescan plugins, but any that does an entire virtual layer scanning (windows.netscan, windows.threads ...).

I'll be glad to hear your thoughts, and improvements, about this idea !

Notes :

The volshell integration hasn't been done yet
This applies to Linux too, the Linux plugin code should be 99% the same

ikelos commented 3 months ago

Thanks, it looks like an interesting idea. I'm afraid it's another one that will likely take me some time to consider the implications of, and I'm also on holiday next week, but do please nudge me after a month if I haven't added any comments on it by then...

Abyss-W4tcher commented 2 months ago

Thanks for the review :)

The main idea, was to provide a way for users to use an optional functionality that would allow saving time by avoiding generating the same deterministic data (input and logic never changes) every plugin run.

As you pointed out, it is not integrated "automatically" in the framework, as I wanted it to be managed by the user, leaving him the task of selecting the right cache file for the right dump.

This hasn't been tested out on a lot of samples, but basically it calls internal framework api and stores the result (like an exported functools.lru_cache()) on disk. Since I patched the AArch64 layer scanning, the time spent on each plugin run to scan the kernel space has drastically decreased. However, on huge samples or when a lot of contiguous memory segments are allocated, this is still interesting to do the scanning once and then re-import the results afterhand.

A great application case you could consider, might be when a developer implements a scanning plugin, and needs a lot of trial and error to find needles in the virtual pages. This allows him to save quite some time on each test.

Either way, I would definitely need to see repeated runs of this with the results being verified as identical, and with the wrong file being passed in to see if it can detect errors if we decide to use it automatically (such as two paused VM snapshots of the same machine where lots of different processes have been started between the runs).

If this plugin doesn't output the same results twice, then the problem would reside inside the _layer_scan API, and directly affect the "normal" scanning process ?

With the wrong file being passed, I am afraid the time spent to verify the correctness of the mappings would require to regenerate the mappings, hence losing the initial idea of this feature (saving time and CPU cycles) ?

Making a hash of the DTB would work, but for samples taken from a different runtime kernel :/ . I wrapped my head around this problem a bit, but I couldn't find an easy solution.

Making a SHA256 of the sample would be too long, and be necessary on each run (storing it in a metadata circles the problem back) ...

ikelos commented 2 months ago

Sure, I applaud the idea and I think it could be worthwhile using it all the time, but that would need three things:

Generating the map to disk when the map is first generated by the framework normally (I've got an idea of reading and hashing the first mapped page, or a few different indicators that should yield a "unique" value, but at the moment that is not unique at all).
A good way of uniquely mapping caches to input files (at the moment that's manual, both manual and automatic come with the chance of getting something wrong, at least with automatic we can do our best to only engage it when we're sure it matches)
A way to turn it off to check it's not causing problems (in the automatic route that would be via --clear-cache)

Abyss-W4tcher commented 2 months ago

Alright, I am processing all your comments. I am thinking of an automated design, but the core of the problem relies on identifying a memory sample and more granularly a layer.

Additionnaly, instead of storing all the layers (of a sample) inside one cache file, requiring to load it entirely when needed, an approach where each layer cache is directly written to a file would speed-up the search and access. Even if we get layers from different samples "mixed" in the cache, this shouldn't be a problem if the layer identifier is strong.

For example, we might consider the following sample code :

stacked_layers = [f"{l.config['class']}.{l.name}" for l in current_layer.context.layers.values()]
"""['volatility3.framework.layers.physical.FileLayer.base_layer', 'volatility3.framework.layers.crash.WindowsCrashDump64Layer.memory_layer', 'volatility3.framework.layers.arm.WindowsIntel32e.layer_name']"""

layer_identifier = hashlib.sha256("unique identifier") # DTB, first mapped page ...
virtmap_filename = os.path.join(
    constants.CACHE_PATH,
    "virtmap_" + layer_identifier.hexdigest() + ".cache",
)
cache_content = {"metadata":{"stacked_layers":stacked_layers}, "mappings" : layer_mappings} # layer_mappings format is {"section": _scan_iterator_results}
with open(virtmap_filename, "w+") as f:
    json.dump(cache_content, f)

I don't know if any metadata would be interesting ? This could allow to easily recognize a file, if any manual investigation is needed.

The layer_identifier can be computed once on layer class instantiation, and used when :

Handling a section scan request, to determine if a mapping exists in the cache, reading it directly if available
Writing a freshly scanned section to disk (or appending the section results to the existing data), allowing to re-import the calculations directly on a later run

Determining if a cache file already exists is straightforward, and does not require a direct content scan. Of course, this relies on the cache filenames not being messed with.

As you pointed out, this automatic feature should be easily togglable (on/off), and also never enabled if the cache is not available (if #410 gets merged one day).

What are your thoughts on this design ? This would pop the plugin, most of the cli code, to mostly concentrate inside layers.py.

ikelos commented 2 months ago

Yeah, so a per-layer design is ok, but identifying the layers is going to be quite a task. It's probably easiest to do it on a hash of a certain number of bytes, but if that's not spread across the entire layer then things like multiple memory snapshots may come back reporting as the layer cache hit. Honestly, it's going to be really difficult identifying one layer from another layer reliably. If there's a clever way of doing that, then go for it. Am I right in thinking that these maps will be dependent on the DTB (and therefore a process layer won't have the same sections as a kernel layer) or not?

Abyss-W4tcher commented 2 months ago

These maps depend on the DTB (per-process and kernel), and the number of sections depends on the virtual pages allocated to the layer.

This is a difficult task, because highest level mappings might look the same between two samples, but PTE's could have been mapped and unmapped, resulting in small skipped sections. Apart from doing a SHA256 of the entire sample on each Volatility run, relying explicitely on the sample filepath (like the config ?), or trying sketchy things like gathering system clocks from the upper layer, there isn't for now an efficient way to uniquely identify a layer.

Initially, I designed this feature to work the same way as the config, where it mostly relies on the user not overwriting a memory sample filepath with another one. However, a wrong config won't get the stacking much far, whereas a wrong layer cache will silently return inaccurate data 🤔.

In the current state, I am afraid that the automatic path will introduce more issues than benefits... So, I guess it should be left as a side feature for now, as increasing the risk of returning inaccurate results automatically is what we really want to avoid ?

volatilityfoundation / volatility3

[Core, plugin] : Virtual mappings dumping and caching #1237