Closed cecio closed 4 years ago
Hello @cecio ,
Thank you for taking the time to write a proper ticket.
Being able to find the virtual address for a process given its physical offset is indeed important as rootkits that unlink their process will not be present in the process list (and potentially other sources in the kernel), and having the virtual address is required for any deep per-process analysis like list of handles, memory maps, and so on.
To accomplish this task, Volatility 2 implemented a function named _virtual_process_from_physicaloffset, which can be found here:
Volatility 2's support for WinXP-8.1 relies on this bounce as those OS versions are scanned at the physical address layer (similar to what @ikelos is quoted as saying). Win10 scans the kernel virtual address space though, which is why the current Vol3 code works on your Windows 10 samples, but not any others.
If you would like to try and adding this capability yourself, then you would want to look these places:
1) how pslist dumps processes: https://github.com/volatilityfoundation/volatility3/blob/master/volatility/framework/plugins/windows/pslist.py#L187
2) where psscan gets its result: https://github.com/volatilityfoundation/volatility3/blob/master/volatility/framework/plugins/windows/psscan.py#L46
If you ported the virtual_process_from_physical_offset function to Vol3 then you should be able to call the pslist.process_dump with your process found through the bounce. You could do this in the existing psscan loop. One thing to be aware of though is that psscan will find terminated process structures, and these cannot be dumped as the page tables no longer are in-tact. The bounce can also fail due to smear during acquisition. So your code would need to catch any failed bounce attempts like Volatility 2 does.
If you decide to add this yourself then please let us know if you have any questions along the way :)
Yes, sure, I'll look into this. I'll let you know if I have questions...thanks! :)
Following what the "virtual_process_from_physical_offset" is doing, I was trying to figure out how to directly extract the _EPROCESS struct for the given physical address, without going through the pool scanner. Snooping around, I was trying something like
virtual_layer_name = self.config['primary']
kvo = self.context.layers[virtual_layer_name].config['kernel_virtual_offset']
ntkrnlmp = self.context.module(self.config["nt_symbols"], layer_name = virtual_layer_name, offset = kvo)
proc = self.context.object(object_type = "_EPROCESS", offset = proc.vol.offset, absolute = True)
The offset
is coming from the pool scanner. If I use the same offset with vol2, it works fine. In this case the object returned (proc
) seems to be invalid (for example the ThreadListHead
is not pointing to anything).
Obviously I'm missing something here...
I tried to switch to memory_layer
for the virtual_layer_name
(being on the physical space), but in this case I think that "module building" should be changed...
Any thoughts? :-) Thanks!
Looking at the psscan code, it seems like you should already have the process object created in the physical translation layer:
the "mem_object" object there should be of type EPROCESS.
Yes, you are completely right.
I was just trying to build a more generic function (like the virtual_process_from_physical_offset in vol2) that was relying only on the offset, more than already having the object.
But I can start by using the mem_object
for the time being, then check if it's possible to go with the other solution in a second iteration.
I think I'm pretty close to a working version.
I'm implementing the sanity checks. One question: I tried to look for an helper function to translate virtual to physical address (like the vtop in vol2). I didn't found it for windows (I saw something for linux and MAC). May be I'm looking in the wrong place.
If it is actually not existing, does it make sense to port it from vol2?
Yes, you are completely right.
I was just trying to build a more generic function (like the _virtual_process_from_physicaloffset in vol2) that was relying only on the offset, more than already having the object.
But I can start by using the
mem_object
for the time being, then check if it's possible to go with the other solution in a second iteration.
From what I remember + a quick grep search, the only places that do this for Windows processes in Vol2 are psscan and volshell, so it seems like sending in an already created object makes sense, especially since psscan gets back an object from the poolscanning API.
I think I'm pretty close to a working version.
I'm implementing the sanity checks. One question: I tried to look for an helper function to translate virtual to physical address (like the vtop in vol2). I didn't found it for windows (I saw something for linux and MAC). May be I'm looking in the wrong place.
If it is actually not existing, does it make sense to port it from vol2?
What did you find for Linux and Mac? All the OSes should use the same APIs so I would like to make sure these are consistent.
There is the translate() function in Vol3 that can give you the same data as vtop.
There is also a function called mapping that can handle translating more than one page at once and returning the physical offset for each. You can see that in use here:
What did you find for Linux and Mac? All the OSes should use the same APIs so I would like to make sure these are consistent.
well, may be I just got it wrong. I was grepping something like physical
and I saw this:
volatility/framework/automagic/linux.py: def virtual_to_physical_address(cls, addr: int) -> int:
volatility/framework/automagic/mac.py: def virtual_to_physical_address(cls, addr: int) -> int:
but may be are referring to something different...
There is the translate() function in Vol3 that can give you the same data as vtop.
There is also a function called mapping that can handle translating more than one page at once and returning the physical offset for each. You can see that in use here
Great! I'll look into these....thanks a lot!!!
Ok, I think I have a working version:
@classmethod
def virtual_process_from_physical(cls,
context: interfaces.context.ContextInterface,
layer_name: str,
symbol_table: str,
proc: interfaces.objects.ObjectInterface) -> \
Iterable[interfaces.objects.ObjectInterface]:
""" Returns a virtual process from a physical addressed one
"""
# We'll use the first thread to bounce back to the virtual process
kvo = context.layers[layer_name].config['kernel_virtual_offset']
ntkrnlmp = context.module(symbol_table,
layer_name = layer_name, offset = kvo)
tleoffset = ntkrnlmp.get_type("_ETHREAD").relative_child_offset("ThreadListEntry")
# Start out with the member offset given to us from the profile
offsets = [tleoffset]
# If (and only if) we're dealing with 64-bit Windows 7 SP1
# then add the other commonly seen member offset to the list
kuser = info.Info.get_kuser_structure(context, layer_name, symbol_table)
nt_major_version = int(kuser.NtMajorVersion)
nt_minor_version = int(kuser.NtMinorVersion)
vers = info.Info.get_version_structure(context, layer_name, symbol_table)
build = vers.MinorVersion
bits = context.layers[layer_name].bits_per_register
version = (nt_major_version, nt_minor_version, build)
if version == (6, 1, 7601) and bits == 64:
offsets.append(tleoffset + 8)
# Now we can try to bounce back
for ofs in offsets:
ethread = ntkrnlmp.object(object_type = "_ETHREAD",
offset = proc.ThreadListHead.Flink - ofs,
absolute = True)
# Ask for the thread's process to get an _EPROCESS with a virtual address space
virtual_process = ethread.owning_process()
# Sanity check the bounce.
# This compares the original offset with the new one (translated from virtual space)
ph_offset, _, _ = context.layers[layer_name]._translate(virtual_process.vol.offset)
if virtual_process and \
proc.vol.offset == ph_offset:
return virtual_process
It works, at least I can use the PsList process_dump
without errors. But it needs more testing: I'm trying to compare some dumps done with this and the same with Vol2, and I see some differences sometimes...I need to investigate a bit more.
In the meantime, if you can have a look and you see something obviously wrong...thanks!
This looks overall reasonable to me. One question though: does translate() work the same here as opposed to _translate()? The underscore name is usually a marker meaning for internal use, so if the non-underscore version is accessible then the code will be a bit cleaner.
Going to tag @ikelos on two questions though:
1) Is there a more direct way to get the major/minor/build?
2) Is using _translate (or translate) okay, or do you want .mapping() used in all instances like this, even for one page?
One question though: does translate() work the same here as opposed to _translate()? The underscore name is usually a marker meaning for internal use, so if the non-underscore version is accessible then the code will be a bit cleaner.
Yes, I switched to .translate
, it works fine and it's cleaner as you said. Thanks!
@atcuno
Well, theoretically it would be stored in the metadata of the symbol file (and we've already found the kernel). Unfortunately, most of our generated files don't contain this data (because it's not in the PDB, I think it needs extracting from the original PE). So this method is probably more accurate, but we could theoretically parse the PE for version info once we've found it when getting the PDB GUID, so freshly generated ones could contain it. The difficulty is, unless it's easy to get, we can't always guarantee it'll be there, and if we can't guarantee it'll be there, the fallback becomes the defacto (hence all the os_distinguisher
stuff). 5:S
translate(layer, offset)
should effectively equate to mapping(layer, offset, 0)
and then the result is mangled for a little easier handling (0 length doesn't technically make sense, but I believe it eases things over a length of 1). So it should be ok, but it should be noted translate isn't part of the TranslationLayerInterface
, it's strictly part of LinearlyMappedLayer
which is almost certainly fine for the intel layer, but may not be the case when a non-linearly mapped layer (such as compressed memory) enters the scene, so you just need to be a little bit careful about it. As long as everyone's aware of those limitations on translate
then I'm ok using it, but for it to work in all cases, it would be better to use mapping
and deal with all possible responses.
I'm running some tests to see if I have consistent values. I found a case where I see a strange behavior. I used this memory dump sample, with an hidden process (Windows XP SP3 x86)
http://amnesia.gtisc.gatech.edu/~moyix/ds_fuzz_hidden_proc.img.bz2
I tried to extract the hidden process (PID 1696) with both vol2 and vol3 with the "new" psscan dump. The result is similar but not equal: the file extracted with vol3 seems to have the sections a bit mixed...I see some zeroed areas for example at the beginning of the .text
section and then some of the data are "slipping" in the .rdata
section.
I tried to trace where this is happening, and in the reconstruct
method called by the process_dump
I see the call
raw_data = read_layer.read(self.vol.offset, nt_header.OptionalHeader.SizeOfImage, pad = True)
The read
method returns these "zeroed" areas I guess because of the "pad" option set. If I try to call the read
with pad = False
I see an error
*** volatility.framework.exceptions.PagedInvalidAddressException: Page Fault at entry 0xe10ff7f800000400 in page entry
Since I'm not sure if this is happening because of some problems in the virtual_process_from_physical
or this is due to something else, I tried some other processes in the same image.
For example PID 620, which is not hidden:
psscan
(new version with the dump) and pslist
(original version) I have the same result (the two files are identical), but both have these "additional" zeroed areasMay be there is something to check in the read
method or am I missing something?
Thanks :)
So by way of explanation the pad
options prevents errors, and returns 0
where a byte could not be read. This is typically whole pages, and happens when a page lookup can't occur. It sounds as though in volatility 2, these were simply being ignored and not output. We could achieve the same in volatility 3, by not including the pad option and then catching any InvalidAddressException
s and not outputting (it will then also require additional code to try starting it back up after the bad page until it's completed everything it should have tried to read). I don't know which is better, I feel the zeroed data gives a more accurate indication of the size of the various areas, but if we're already truncating the full memory to get down to this, perhaps it would be better to simply ignore it?
If there's just additional padding (null bytes), then that's fine, but if there's data missing in vol3 that's present in vol2, then I'm slightly concerned, and that definitely needs investigating. If the image you're using private, and if not would you be able to get us a copy and clearly provide both commands you use for vol2 and vol3 so we can look into it further please?
The image is available on the Internet, it is a sample I found googling:
http://amnesia.gtisc.gatech.edu/~moyix/ds_fuzz_hidden_proc.img.bz2
Here the commands I'm using:
Vol3
python3 vol.py -f /tmp/ds_fuzz_hidden_proc.img windows.pslist.PsList --pid 620 --dump
Vol2
python vol.py -f /tmp/ds_fuzz_hidden_proc.img --profile=WinXPSP3x86 procdump -p 620 -D /tmp
Then I also dumped the same process (620) with the "new" PsScan I implemented
python3 vol.py -f /tmp/ds_fuzz_hidden_proc.img windows.psscan.PsScan --pid 620 --dump
and it returns exactly the same image dumped by the PsList
command above.
Ok, so we're not sure how best to communicate this, but the default (and currently only) mode of dumping for processes is from the memory. If you add --memory
to the vol2 command that you used, you'll get identical files for the process. We're not sure when or whether the old functionality will be present for attempting to reconstruct an executable file will be added, but at least I'm assured now that we're not really doing anything differently. This particular difference has already been flagged in #272 if you want to follow along there.
Great! I'll adjust a couple of things and I'll make a pull request for the dump of psscan.
Hello.
As agreed, I'm going to open the issue:
I was trying to add to psscan, the same functionalities present in pslist (filters and dump). I did it importing the plugin, and everything look to be working fine...at least on win10 images. I noticed that the "process_dump" method (used instantiating the pslist plugin) works fine if I use it on some Win10 images, but it fails if used with WinXP or Win7 images. The part failing is the
proc_layer_name = proc.add_process_layer()
and the error is "Parent layer is not a translation layer, unable to construct process layer".
As @ikelos reported, the problem is due to the fact that
I'm available for any testing or help on this if you'd like. Thanks a lot!