volatilityfoundation / volatility3

Volatility 3.0 development
http://volatilityfoundation.org/
Other
2.72k stars 461 forks source link

Add new pe_symbols API, debug registers plugin, unhooked system calls… #1250

Closed atcuno closed 2 months ago

atcuno commented 2 months ago

… plugin

This is the final form of the PE symbols plugin and API. The API is very flexible and supports plugins calling it with any number of modules (DLLs or kernel drivers) and resolving symbol names to addresses or doing a reverse lookup for addresses to symbols. A built in cache makes it to where repeated calls for the same data do not require any further searching. The API is also made to generically support new methods of symbol resolution through new implementations of PESymbolFinder, for which there will be more in the future. The API is extremely powerful as, instead of how existing plugins only look for PDBs and/or exports in their own process, the API uses every process to attempt to get the symbol information - so even if an infected process has the export table or PDB page smeared, some other process will still hold the data that is needed. The same concept happens in the kernel by walking every session layer.

To show the API in use in both directions (names->address and address->name), I have included two plugins from DEFCON that use the final API form.

Example output of it in an infected sample:

$ python3 vol.py  -r pretty -f /mnt/samples/Sample1.lime windows.unhooked 
Volatility 3 Framework 2.7.2

  |                Function |                          Distinct Implementations | Total Implementations
* |          NtCreateThread |                                                   |                    28
* |     NtReadVirtualMemory |                                                   |                    28
* |             NtWriteFile |                                                   |                    28
* | NtAllocateVirtualMemory |                                                   |                    28
* |     NtAcceptConnectPort |                                                   |                    28
* | NtAdjustPrivilegesToken |                                                   |                    28
* |                 NtClose | 668:services.exe, 940:svchost.exe, 1928:lsass.exe |                    25
* |          NtCreateMutant |                                                   |                    28
* |      NtOpenIoCompletion |                                                   |                    28
* |               NtOpenKey |                                                   |                    28
* |            NtOpenThread |                                                   |                    28
* |     NtOpenThreadTokenEx |                                                   |                    28
* |            NtTraceEvent |                                                   |                    28
* |          NtUnloadDriver |                                                   |                    28
* |           NtUnloadKeyEx |                                                   |                    28
* |         NtCreateSection | 668:services.exe, 940:svchost.exe, 1928:lsass.exe |                    25
* |        NtDeleteValueKey |                                                   |                    28
* |         NtQueryValueKey |                                                   |                    28
* |  NtRequestWaitReplyPort |                                                   |                    28
* |      NtSetContextThread |                                                   |                    28
* |           NtSetValueKey |                                                   |                    28
* |      NtTerminateProcess |                                                   |                    28
* |  NtProtectVirtualMemory |                                                   |                    28
* |    NtQueryVirtualMemory |                                                   |                    28
* |            NtCreateFile |                                                   |                    28
* |         NtOpenJobObject |                                                   |                    28
* |       NtOpenThreadToken |                                                   |                    28
* |     NtTranslateFilePath |                                                   |                    28
* |             NtUnloadKey |                                                   |                    28
* |             NtCreateKey |                                                   |                    28
* |       NtDuplicateObject |                                                   |                    28
* |            NtRestoreKey |                                                   |                    28
* |    NtSystemDebugControl |                                                   |                    28
* |    NtWriteVirtualMemory |                                                   |                    28
* |     NtSetSecurityObject |                                                   |                    28
* |           NtOpenProcess |                                                   |                    28
* |              NtOpenFile | 668:services.exe, 940:svchost.exe, 1928:lsass.exe |                    25
* |             NtDeleteKey |                                                   |                    28
* |            NtReplaceKey |                                                   |                    28

The processes with distinct implementations listed are hooked by the malware.

The second plugin, debug registers, uses the reverse lookup to map an address that is being monitored to its symbol. Note that the plugin can automatically determine that AmsiScanBuffer is being manipulated by this malware.

$ python3 vol.py -r pretty -f data.lime windows.debugregisters
  |        Process |  PID |  TID | State |  Dr7 |            Dr0 |                     Range0 |        Symbol0 | Dr1 | Range1 | Symbol1 | Dr2 | Range2 | Symbol2 | Dr3 | Range3 | Symbol3
* | patchless_amsi | 3520 | 8512 |     5 | 1025 | 0x7ffa648b3860 | \Windows\System32\amsi.dll | AmsiScanBuffer | 0x0 |    N/A |     N/A | 0x0 |    N/A |     N/A | 0x0 |    N/A |     N/A

There are many more plugins that will be using this API once its merged, but I included two only now to show the API in real plugin usage.

All plugins pass:

I also updated vadinfo.py to fully pass these checks along with the needed annotations.

atcuno commented 2 months ago

@ikelos I am working on all the args/returns docs for this. Is there a standard line in an existing plugin for the often repeated arguments (context, config, config_path, etc.). I would like to use the standard one and seems like all plugins should.

atcuno commented 2 months ago

@ikelos the new list requirement form is really cool

$ python3 vol.py -f /mnt/samples/Sample1.lime windows.pe_symbols --source processes --module ntdll.dll --symbols NtOpenThread NtCreateFile
Volatility 3 Framework 2.9.0
Module  Symbol  Address
ntdll.dll   NtOpenThread    0x7c90d640
ntdll.dll   NtCreateFile    0x7c90d090
atcuno commented 2 months ago
$ python3 vol.py -f /mnt/samples/Sample1.lime windows.pe_symbols --source processes --module ntdll.dll --addresses 0x7c90d640 0x7c90d090
Volatility 3 Framework 2.9.0
 Module Symbol  Address

ntdll.dll   NtOpenThread    0x7c90d640
ntdll.dll   NtCreateFile    0x7c90d090
atcuno commented 2 months ago

@ikelos this is ready again. I left a few conversations unresolved as I either had questions for you to answer or for you to see then resolve. Also, this needs to be prioritized still as every remaining Windows plugin for parity is using this API.

ikelos commented 2 months ago

@ikelos I am working on all the args/returns docs for this. Is there a standard line in an existing plugin for the often repeated arguments (context, config, config_path, etc.). I would like to use the standard one and seems like all plugins should.

Seems to be "The context to retrieve required elements (layers, symbol tables) from" for most linux/mac plugins...

ikelos commented 2 months ago

Ok, think I've caught most of the responses. I've reopened some because modifying input parameters rather than creating new output parameters really makes my skin crawl. The docstrings saying that the value gets changed do help, but it kinda in little letters at the end rather than something obvious to spot. I'm still not keen on it and I think I need to re-read it again to see where the generator is used and then whether people are expect to interact with it. I think I'd still prefer the generator yielding it as intermediate state (so each found module is part of a tuple containing the list of missed ones, or all those not-yet-resolved, or just something). I'm still looking for a happy compromise we can both agree on, but I'm not yet accepting that there's just a changing variable which you have to know will be altered as you're going around...

Ok, I figured out why I don't like it so much! Almost all the rest of python (not just our code) either returns or mutates, but not both. That's why it's so jarring. That's what my problem with it is (moreso, being in the middle of a generator, it means that you might pass the generator 8 calls away, and then start iterating it, and it will suddenly start changing another variable). I really, really don't like it. There must be another way of achieving the same goal? https://stackoverflow.com/questions/26027694/correct-style-for-python-functions-that-mutate-the-argument

Otherwise it's looking pretty tidy, I think that's the last bit to get sorted...

atcuno commented 2 months ago

Ok, think I've caught most of the responses. I've reopened some because modifying input parameters rather than creating new output parameters really makes my skin crawl. The docstrings saying that the value gets changed do help, but it kinda in little letters at the end rather than something obvious to spot. I'm still not keen on it and I think I need to re-read it again to see where the generator is used and then whether people are expect to interact with it. I think I'd still prefer the generator yielding it as intermediate state (so each found module is part of a tuple containing the list of missed ones, or all those not-yet-resolved, or just something). I'm still looking for a happy compromise we can both agree on, but I'm not yet accepting that there's just a changing variable which you have to know will be altered as you're going around...

Ok, I figured out why I don't like it so much! Almost all the rest of python (not just our code) either returns or mutates, but not both. That's why it's so jarring. That's what my problem with it is (moreso, being in the middle of a generator, it means that you might pass the generator 8 calls away, and then start iterating it, and it will suddenly start changing another variable). I really, really don't like it. There must be another way of achieving the same goal? https://stackoverflow.com/questions/26027694/correct-style-for-python-functions-that-mutate-the-argument

Otherwise it's looking pretty tidy, I think that's the last bit to get sorted...

I put thoughts on this here:

https://github.com/volatilityfoundation/volatility3/pull/1250#discussion_r1752763561

My propsoal then is to copy.deepcopy() the parameter so we can remove from the copy, this will keep the parameter in tact. The output parameter (found_symbols) is created from scratch and not a mutation of the input parameter. Does this work? We need to be able to track when we parsed all of the symbols for reasons in my linked comment.

ikelos commented 2 months ago

Yep, that sounds ideal. It can be deepcopied (can probably be normal copied since all we really care is that the original list doesn't change and if they're Volatility objects it might be quite slow?) and then updated and fed out on each generator call.

ikelos commented 2 months ago

When you say the output parameter of found symbols, I imagined you'd return a tuple of found_symbols, remaining and then remaining would just be the updated copy of the input symbols but with the found ones removed?

atcuno commented 2 months ago

When you say the output parameter of found symbols, I imagined you'd return a tuple of found_symbols, remaining and then remaining would just be the updated copy of the input symbols but with the found ones removed?

It doesn't do this now, but I can add that in easily.

atcuno commented 2 months ago

@ikelos I got it working in the way you wanted - the copy of the input dictionary being changed only in the same function, no more generator internally, and the unresolved (missing symbols) returned back to the caller. Black/mypy/pylint all passed plus all plugin forms retested. I think we are good now :)

ikelos commented 2 months ago

Much happier with that thanks. Hopefully the deep copy isn't too slow, you might be better off just repopulating the lists, but we can worry about efficiency if it proves to be a problem...

The naming of the types is still a little irk, but not a showstopper by any means.

Last the remaining is the big if statements being replaced by or NotAvailableValue()... I've checked through unhooked_system_calls and that looks good, so... nearly there!

atcuno commented 2 months ago

Much happier with that thanks. Hopefully the deep copy isn't too slow, you might be better off just repopulating the lists, but we can worry about efficiency if it proves to be a problem...

The naming of the types is still a little irk, but not a showstopper by any means.

Last the remaining is the big if statements being replaced by or NotAvailableValue()... I've checked through unhooked_system_calls and that looks good, so... nearly there!

This should all be done now. The returning None and using "or ..." is here:

https://github.com/volatilityfoundation/volatility3/pull/1250/commits/79b8ff7d05b56316d90be682f56db22401f2c265#diff-4a5f13df514b51380c6283431d0487787d9c48df75cc725fb255f1cd3ce50706R151

atcuno commented 2 months ago

@ikelos everything should be addressed now.