volatilityfoundation / volatility3

Volatility 3.0 development
http://volatilityfoundation.org/
Other
2.61k stars 447 forks source link

Missing bytes from symbol address #883

Closed ShellCode33 closed 1 year ago

ShellCode33 commented 1 year ago

Hello there,

In volatility2, the following code :

ns_addr = self.addr_space.profile.get_symbol("init_pid_ns")
print("ns_addr: " + hex(ns_addr))

gives the following output :

ns_addr: 0xffffffff96653c80L

Whereas in volatility3, the following code :

aslr_shift = kernel_layer.config['kernel_virtual_offset']

ns_addr = vmlinux.get_symbol("init_pid_ns").address + aslr_shift
print("ns_addr (w/ aslr shift):", hex(ns_addr))
ns_addr = vmlinux.get_symbol("init_pid_ns").address
print("ns_addr (no aslr shift):", hex(ns_addr))
ns_addr |= 0xFFFF_0000_0000_0000 # FIXME, why did volatility remove the leading \xff from the address ?
print("ns_addr (dirty fix):", hex(ns_addr))

pid_namespace = vmlinux.object(object_type='list_head', offset=ns_addr)
print("pid_namespace:", pid_namespace)

Gives the following output :

ns_addr (w/ aslr shift): 0xffff96653c80
ns_addr (no aslr shift): 0xffff82653c80
ns_addr (dirty fix): 0xffffffff82653c80
pid_namespace: <list_head symbol_table_name1!list_head: layer_name @ 0xffff96653c80 #16>

Apparently vmlinux.object() already handles the ASLR shift, however where are the 2 leading \xff\xff we can see in volatility2 output ?

In the json file containing symbols I have the following :

    "init_pid_ns": {
      "type": {
        "kind": "struct",
        "name": "pid_namespace"
      },
      "address": 18446744071602257024
    },

18446744071602257024 == 0xffffffff82653c80

I had a case where I had to look for the occurrences of an address in memory (using the BytesScanner) and I couldn't find any because of the missing \xff\xff (I was doing a to_bytes(length=8, "little") which added \x00\x00 instead of the \xff\xff I was looking for).

ikelos commented 1 year ago

Hiya, the symbol is interpreted as an address. Current processors only work with the first 48-bits of a virtual address. The remaining bits are often sign-extended to form a canonical address these days. However, there was some discrepancy between the intel and amd specifications on this in the past. This also was not the case before 64-bit systems (such as PAE). In earlier work (specifically our encounters with PAE and similar in volatility 2), this led to equivalence tests failing as the high bits of pointers were ignored. For this reason, during the design of volatility 3, we capped pointers to their maximum virtual addresses (across all intel paging systems, which share a common root) to allow for logical equivalence tests to complete. However on 64-bit systems, it makes physical comparison more difficult.

When the bytes were converted this appears not to have been considered and hence the value did not match. You may also be interested in #699. This is not a bug, but an intricacy of dealing with physical and logical addresses across multiple versions of a processor.

We are investigating ways of improving this, from adding a canonicalization function (to allow logical addresses to be converted to their physical form for 64-bit systems) through to overhauling the internal workings of the intel look-up mechanism, but these take a lot of consideration and are not high priority, hence it's taking so long to move forward on this. I hope that explains the situation somewhat?

ShellCode33 commented 1 year ago

That makes it clearer, thanks !

What would you say is the best way to handle such cases then ? Considering I only want to support Intel architectures for now.

What do you think of the following ?

kernel_layer = context.layers[vmlinux.layer_name]

if kernel_layer.metadata.get("architecture") == "Intel64":
    sym_addr |= 0xFFFF_0000_0000_0000

Will the 0xFFFF_0000_0000_0000 mask be the same for all 64bits Intel CPUs ?

paulkermann commented 1 year ago

Will the 0xFFFF_0000_0000_0000 mask be the same for all 64bits Intel CPUs ?

No. This mask is not entirely correct for 2 reason: 1) Only 48 bits of the address are used so the correct mask for the sign extending on PML4 would be 0xffff800000000000 1) With LA57 (which is supported in linux) this mask will also be invalid. The correct mask for LA57 is 0xff00000000000000 The masks above are for "making the address a kernel address".

There was a PR for implementing canonical addressing however it was closed (#702 ). The claims about slower scanning were disputed after extra testing and @ikelos can elaborate further.

ShellCode33 commented 1 year ago

Thanks for your reply ! Do you know of any volatility way to detect if PML4 or LA57 is in use ? (as a temporary workaround)

paulkermann commented 1 year ago

Currently volatility(2 or 3) does not support 5 level paging at all. There is this branch which add support for the layer of translation for 5 level paging.

ShellCode33 commented 1 year ago

Ok thanks, so |= 0xFFFF_8000_0000_0000 it is

ikelos commented 1 year ago

Hi, please take a look at the feature/canonical-helper function. It adds two methods to the Intel layer, to canonicalize and decanonicalize addresses which can be used to carry out the alterations to the address you're interested in.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 200 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 60 days since being marked as stale.