volatilityfoundation / volatility3

Volatility 3.0 development
http://volatilityfoundation.org/
Other
2.72k stars 463 forks source link

Issue with windows.strings plugin #876

Open crocodile-on-the-nile opened 1 year ago

crocodile-on-the-nile commented 1 year ago

Describe the bug windows.strings plugin does not display a message when a specific string is identified in the memory of a process

Context Volatility Version: Volatility 3 Framework 2.4.1 Operating System: Windows 10 Python Version: 3.9.9 Suspected Operating System: Windows 10 Build 1809
Command: python vol.py -f ..\DESKTOP-N81KBM0-20221121-235651.dmp windows.strings --strings-file=..\extracted_strings.txt

Details of experiment

Step 1: Acquiring the memory dump

On a Windows 10 computer, a text file was opened in notepad.exe. The text file contained the target string mysupersecretfancypassword. The process ID of notepad.exe is 6896. Another text file opened in wordpad.exe contained the same target string. For now, the focus is only on notepad.

Memory was acquired from this computer as DESKTOP-N81KBM0-20221121-235651.dmp using DumpIt.exe.

Step 2: Extracting target string with offset from the memory dump

Strings.exe was used to extract all the strings from this memory dump into a text file. The command used was:

strings64.exe -o DESKTOP-N81KBM0-20221121-235651.dmp > strings_from_dump.txt

Four lines from _strings_fromdump.txt that contained the target string and its offset were copied into _extractedstrings.txt

The contents of _extractedstrings.txt is as follows:

type ..\extracted_strings.txt 2065948033:mysupersecretfancypassword\par 2201153780:mysupersecretfancypassword 2443052438:mysupersecretfancypassword 2444601355:mysupersecretfancypassword

Step 3: Attempting to find context for strings in _extractedstrings.txt

The goal was to use windows.strings plugin to prove that the strings in _extractedstrings.txt can be found within the memory of notepad.exe with PID 6896.

Since the following command took a long time to complete,

python vol.py -f ..\DESKTOP-N81KBM0-20221121-235651.dmp windows.strings --strings-file=..\extracted_strings.txt

The PID of notepad was included in the previous command, as:

python vol.py -f ..\DESKTOP-N81KBM0-20221121-235651.dmp windows.strings --pid 6896 --strings-file=..\extracted_strings.txt

Expected behavior

The expected output was a message like 'String found'. However, there was only a message indicating that the string search is progressing.

Example output

Output 1:

python vol.py -f ..\DESKTOP-N81KBM0-20221121-235651.dmp windows.strings --pid 6896 --strings-file=..\extracted_strings.txt Volatility 3 Framework 2.4.1 Progress:  100.00               PDB scanning finished String  Physical Address        Result Progress:    0.00               Creating mapping for task 6896 mysupersecretfancypassword\par Progress:   25.00               Matching strings in memory mysupersecretfancypassword Progress:   50.00               Matching strings in memory mysupersecretfancypassword Progress:   75.00               Matching strings in memory Progress:  100.00               Matching strings in memory

Output 2:

python vol.py -vv -f ..\DESKTOP-N81KBM0-20221121-235651.dmp windows.strings --pid 6896 --strings-file=..\extracted_strings.txt

Volatility 3 Framework 2.4.1 INFO     volatility3.cli: Volatility plugins path: ['C:\Users\ten\Documents\volatility3\volatility3\plugins', 'C:\Users\ten\Documents\volatility3\volatility3\framework\plugins'] INFO     volatility3.cli: Volatility symbols path: ['C:\Users\ten\Documents\volatility3\volatility3\symbols', 'C:\Users\ten\Documents\volatility3\volatility3\framework\symbols'] INFO     volatility3.framework.automagic: Detected a windows category plugin INFO     volatility3.framework.automagic: Running automagic: ConstructionMagic INFO     volatility3.framework.automagic: Running automagic: SymbolCacheMagic INFO     volatility3.framework.automagic: Running automagic: LayerStacker INFO     volatility3.schemas: Dependency for validation unavailable: jsonschema DEBUG    volatility3.schemas: All validations will report success, even with malformed input INFO     volatility3.schemas: Dependency for validation unavailable: jsonschema DEBUG    volatility3.schemas: All validations will report success, even with malformed input DEBUG    volatility3.framework.automagic.windows: Detecting Self-referential pointer for recent windows DEBUG    volatility3.framework.automagic.windows: DtbSelfRef64bit test succeeded at 0x1aa000 DEBUG    volatility3.framework.automagic.windows: DTB was found at: 0x1aa000 INFO     volatility3.schemas: Dependency for validation unavailable: jsonschema DEBUG    volatility3.schemas: All validations will report success, even with malformed input INFO     volatility3.schemas: Dependency for validation unavailable: jsonschema DEBUG    volatility3.schemas: All validations will report success, even with malformed input DEBUG    volatility3.framework.automagic.stacker: Stacked layers: ['IntelLayer', 'WindowsCrashDump64Layer', 'FileLayer'] INFO     volatility3.framework.automagic: Running automagic: WinSwapLayers INFO     volatility3.framework.automagic: Running automagic: KernelPDBScanner DEBUG    volatility3.framework.automagic.pdbscan: Kernel base determination - searching layer module list structure DEBUG    volatility3.framework.automagic.pdbscan: Setting kernel_virtual_offset to 0xf80767a1f000 DEBUG    volatility3.framework.symbols.windows.pdbutil: Using symbol library: ntkrnlmp.pdb\99DE394F56795BA4DDAEBA33444A9F1A-1 INFO     volatility3.schemas: Dependency for validation unavailable: jsonschema DEBUG    volatility3.schemas: All validations will report success, even with malformed input INFO     volatility3.framework.automagic: Running automagic: SymbolFinder INFO     volatility3.framework.automagic: Running automagic: KernelModule

String  Physical Address        Result DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_PO_PROCESS_ENERGY_CONTEXT DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_EPROCESS_QUOTA_BLOCK DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_PAGEFAULT_HISTORY DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_JOB_ACCESS_STATE DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_JOB_CPU_RATE_CONTROL DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_JOB_NET_RATE_CONTROL DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_JOB_NOTIFICATION_INFORMATION DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_PSP_STORAGE DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_ACTIVATION_CONTEXT_DATA DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_FLS_CALLBACK_INFO DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_ASSEMBLY_STORAGE_MAP DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_DBGKP_ERROR_PORT DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_CI_NGEN_PATHS DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_WNF_SCOPE_MAP DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_EX_WNF_SUBSCRIPTION DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_ETW_EVENT_CALLBACK_CONTEXT DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_EX_TIMER DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_ETW_SOFT_RESTART_CONTEXT DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_ETW_STACK_CACHE DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_ETW_PERFECT_HASH_FUNCTION DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_HAL_PMC_COUNTERS DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_DEVICE_NODE_IOMMU_EXTENSION DEBUG    volatility3.framework.symbols: Unresolved reference: symbol_table_name1!_SCSI_REQUEST_BLOCK Progress:    0.00               Creating mapping for task 6896 mysupersecretfancypassword\par Progress:   25.00               Matching strings in memory mysupersecretfancypassword Progress:   50.00               Matching strings in memory mysupersecretfancypassword Progress:   75.00               Matching strings in memory Progress:  100.00               Matching strings in memory

Additional information

All the pages in memory associated with notepad.exe were dumped using windows.memmap:

python vol.py -f ..\DESKTOP-N81KBM0-20221121-235651.dmp windows.memmap --pid 6896 --dump

Strings.exe was used against the dumped process memory, and the extracted strings were stored in _strings_from_dumpedprocess.txt

strings64.exe pid.6896.dmp > strings_from_dumped_process.txt

Within _strings_from_dumpedprocess.txt, when a simple find operation was performed in Notepad++, it is possible to find the instance of mysupersecretfancypassword.

strings1

The target string mysupersecretfancypassword is present within the memory of notepad.exe, but is not identified by windows.strings plugin.

Could you please look into this?

ikelos commented 1 year ago

Your expected output doesn't match what the plugin does. All plugins within volatility 3 return a table of the results data, usually that could be used by another tool. The plugin found three hits at three of the four offsets (as you can see in example 1, they're interspersed between the progress callbacks, one is directly after Creating mapping for task 6896), what's strange is that there are no offsets or processes output alongside them, which I can't explain.

The plugin yields these values here and offset is part of the loop it's traversing so I can't see how that value isn't being output, but we can investigate further, thanks for letting us know...

kookiecrack commented 1 year ago

Hi, On a related topic, can i suggest that the progress status doesn't interrupt the results? I piped std error to a separate file as a workaround. Thanks! Screenshot-1

ikelos commented 1 year ago

Hi @kookiecrack, since the default output renderer is the quick output, which doesn't gather the data, but returns it as quickly as possible, it may return it whilst the scans are still ongoing. If you wish the output to be uninterrupted you can choose a different renderer (such as the pretty output) you can use -r pretty to get the output after all searching has happened or use --quiet or -q to silence the progress output. If this isn't sufficient please feel free to file a new ticket to discuss the issue... 5:)

kookiecrack commented 1 year ago

thank you! That's useful to know, i will test them out.

gitter-sudo commented 1 year ago

hi, I'm experiencing an issue related to the windows.strings plugin. on a virtual box VM (OS W10) I captured RAM using both FTK tool and VBoxManage debugvm dumpvmcore command strings sysinternal tool finds the string I search for but windows.strings plugin doesn't indicate which process(es) this string belongs to.

here command I used _python3 vol.py -f mem_file -r pretty windows.strings.Strings --strings-file 'strings_file' > outputfile

output always shows FREE MEMORY instead PID of related precess(es)

any idea? thanks

ikelos commented 1 year ago

Hiya, probably worth checking the output of pslist to make sure volatility can see the processes (if there's no list there, volatility won't be able to figure it out). Other than that I'd suggest checking the debugging information, but really the filter option is the only thing that could interfere with it, and that has code to explicitly not filter if the pid list is empty?

gitter-sudo commented 1 year ago

thanks @ikelos , I tried using pslist and I confirm it recognizes pid of processes. about your tip: how can I check the debugging information to throubleshoot the issue?

ikelos commented 1 year ago

Hiya, given that pslist does return, you'd probably need to modify the python to see why pidlist is empty or not being hit in the filter... The debugging information I mentioned was by running vol.py -vvvvvvv rather than just vol.py. I don't think that'll provide anything useful extra though.

Theoretically you could change this line to just return filter_func, which should essentially disable all pid filtering (but that's kinda what should be happening already). If you give those a try and let us know how you get on, we can try to figure out what's going on?

gitter-sudo commented 1 year ago

hi,this is how I changed pslist.py by your tip:

    ...
    filter_func = lambda _: False
    return filter_func
    # FIXME: mypy #4973 or #2608
    pid_list = pid_list or []
    ...

then I used the command

_python3 vol.py -f mem_file -r pretty windows.strings.Strings --strings-file 'strings_file' > outputfile

same result: output always shows FREE MEMORY instead PID of related precess(es) and nothing else

ikelos commented 1 year ago

Hmmm, then it gets more tricky... Since we basically disabled the filter function (or made it always return False) then this line should mean that the process list is traversed completely. And you're not even getting a process id of unknown, which suggests that the list of process entries isn't getting doesn't contain anything, or an exception happens before it gets there I suppose?

Could you please attach the output from running your command without the redirect (or making sure to redirect stderr as well) and with vol.py -vvvvvvv instead of just vol.py?

gitter-sudo commented 1 year ago

volatility-debug.txt here the output; I hope it'll help

gitter-sudo commented 1 year ago

hi @ikelos,did you find any useful information?

ikelos commented 1 year ago

I'm afraid not, there's nothing unusual in the debug output, so it's possible the contents of the strings file don't accurately reflect the correct offsets?

It might also be worth checking that the offsets in the strings text file and in decimal?

gitter-sudo commented 1 year ago

before using the command: python3 vol.py -f mem_file -r pretty windows.strings.Strings --strings-file 'strings_file' > output_file

in 'string_file' I put the output of the strings sysinternal tool in the form offset:string

I checked on strings tool docs for useful option to achieve the result you suggested but there is no way to change for decimal or other offset formats. could be I don't get what you are suggesting to me (?)

ikelos commented 1 year ago

I was concerned that strings might output hex offsets, or something, but they need to be in decimal. I think strings takes -td to make sure it's in decimal. Otherwise I can't think what the issue would be and you'd need to do a lot of digging in volshell to check that the mapping lines up and see why the data isn't in the process address space. Sorry, I'm not really sure what else to suggest. 5:S

gitter-sudo commented 1 year ago

thanks @ikelos for your tip.I confirm that offsets I used were in decimal.

last three days I stressed volatility and myself!!! ;) pid8012 is the process I used for string search test. 7103389488,7211303872,16338923312,16526754752 are the offsets in decimal that strings command found in ram-dump memory for the test string. 0x1a7651f30,0x1add3c3c0,0x3cde02f30,0x3d91243c0 are the same offsets in hex format. using windows.strings.Strings plugin,the test string doesn't look to be associated to the pid8012 process. using windows.memmap.Memmap,I dumped pid8012 memory and confirmed that the test string is in this dump(strings command found the test string two times). based on the output of the vol-dump command in the previous step,I put in a spreadsheet all Physical memory addresses related to pid8012 process and after checking I confirm that two offset addresses found by the strings command (0x1a7651f30,0x1add3c3c0,0x3cde02f30,0x3d91243c0) are related to pid8012 process: mem-addresses

based on this findings,It looks that offsets addresses association is right even if the windows.strings.Strings plugin doesn't give the expected output.It looks really weird to me

gitter-sudo commented 1 year ago

Hi all,any good news?

kevthehermit commented 1 year ago

OK, Think I have the solution to this one. Or at least part of it.

If i Check using Vol2.6 i get one match and two misses, which is what i expect.

❯ volatility -f /dumps/command-dump-3.raw --profile=Win10x64_19041 strings --string-file /dumps/test.txt --pid 7400
Volatility Foundation Volatility Framework 2.6.1
3128604768 [FREE MEMORY:-1] net user /add admin443 P@ssw0rd!
3128604192 [FREE MEMORY:-1] net user /add admin443 P@ssw0rd!
1963826768 [7400:13f54521e50] net user /add admin443 P@ssw0rd!

In Vol 3 I get

❯ python3 vol.py -f /mnt/d/Projects/command-dump-3.raw windows.strings --strings-file /mnt/d/Projects/test.txt --pid 7400                                                                                                                                                                                                                 ─╯
Volatility 3 Framework 2.5.2
Progress:  100.00               PDB scanning finished                        
String  Physical Address        Result
Progress:    0.00               Creating mapping for task 7400               
net user /add admin443 P@ssw0rd!
Progress:   33.330      FREE MEMMatching strings in memory                   
net user /add admin443 P@ssw0rd!
Progress:   66.670      FREE MEMMatching strings in memory                   
net user /add admin443 P@ssw0rd!
Progress:  100.000      FREE MEMMatching strings in memory   

The progress bar here gets in the way a lot switching to pretty gives me a better view but still not getting the results I expect

❯ python3 vol.py -r pretty -f /mnt/d/Projects/command-dump-3.raw windows.strings --strings-file /mnt/d/Projects/test.txt --pid 7400                                                                                                                                                                                                       ─╯
Volatility 3 Framework 2.5.2
Formatting...0.00               PDB scanning finished                        
  |                           String | Physical Address |      Result        
* | net user /add admin443 P@ssw0rd! |       0xba7ab860 | FREE MEMORY
  |                                  |                  |            
* | net user /add admin443 P@ssw0rd! |       0xba7ab620 | FREE MEMORY
  |                                  |                  |            
* | net user /add admin443 P@ssw0rd! |       0x750d9e50 | FREE MEMORY
  |                                  |                  |   

Dumping the revmap object that is being created (extract below)

    },
    1371509227520: {('Process 7400',
        2486009856)
    },
    1371509231616: {('Process 7400',
        1963823104)
    },
    1371509235712: {('Process 7400',
        247308288)
    },
    1371509239808: {('Process 7400',
        2285744128)
    },
    1371509243904: {('Process 7400',
        4464627712)
    },
    1371509248000: {('Process 7400',
        2530074624)
    },

Looking at https://github.com/volatilityfoundation/volatility3/blob/b050b108e4ff5d709801fe7291c952f4dac2a21c/volatility3/framework/plugins/windows/strings.py#L88

I convert the known offset

>>> 1963826768 >> 12
479449

and can not find it anywhere in the mapping.

If i convert the offset using offset & 0xFFFFFFFFFFFFF000 which is how its done in vol 2.6 https://github.com/volatilityfoundation/volatility/blob/a438e768194a9e05eb4d9ee9338b881c0fa25937/volatility/plugins/strings.py#L162

>>> 1963826768 & 0xFFFFFFFFFFFFF000
1963823104

I find my offset in the revmap but it's not the key its in the tuple.

So I made a couple of changes,

The result is

❯ python3 vol.py -r pretty -f /mnt/d/Projects/command-dump-3.raw windows.strings --strings-file /mnt/d/Projects/test.txt --pid 7400                                                                                                                                                                                                       ─╯
Volatility 3 Framework 2.5.2
Formatting...0.00               PDB scanning finished                        
  |                           String | Physical Address |                     Result
* | net user /add admin443 P@ssw0rd! |       0xba7ab860 |                FREE MEMORY
  |                                  |                  |                           
* | net user /add admin443 P@ssw0rd! |       0xba7ab620 |                FREE MEMORY
  |                                  |                  |                           
* | net user /add admin443 P@ssw0rd! |       0x750d9e50 | Process 7400:0x13f54521000
  |                                  |                  |                           

This is a lot closer its giving me the correct page for the virtual address I need to get a vol2.6 dev instance up and running so i can check what the revmap in vol2 actually looks like

kevthehermit commented 1 year ago

Ok revmap in Vol 2.6 is pretty different

    ],
    3210088448L: [True, ('kernel', None,
        227660583002112)
    ],
    1963823104L: [False, ( [unsigned int
        ]: 7400,
        [String ImageFileName
        ] @ 0xFFFF910B7AED6668,
        1371509231616)
    ],
    4423434240L: [True, ('kernel', None,
        227657136640000)
    ],
    54816768L: [True, ('kernel', None,
        272680769318912)
    ],
kevthehermit commented 1 year ago

Will open a PR but have a functional windows.strings compared to the vol 2.6 version.

Vol 2.6

thehermit@Aurora:~/volatility$ python vol.py -vvvv -f /mnt/d/Projects/command-dump-3.raw --profile=Win10x64_19041 strings --string-file /mnt/d/Projects/test.txt --pid 7400
Volatility Foundation Volatility Framework 2.6.1
1456088752 [7400:13f547ea6b0] ping techanarchy.net
1963826768 [7400:13f54521e50] net user /add admin443 P@ssw0rd!
247310672 [7400:13f54522950] echo "hello world" > hello.txt
2457871104 [FREE MEMORY:-1] net localgroup administrators admin443 /add
1716668544 [7400:13f5478d880] echo "randomthings" > c:\programdata\AnyDesk\
1716668768 [7400:13f5478d960] CertUtil: -encode command completed successfully.e
1716669216 [7400:13f5478db20] net localgroup administrators admin443 /add
807012064 [FREE MEMORY:-1] net localgroup administrators admin443 /add

Vol 3

❯ python3 vol.py -r pretty -f /mnt/d/Projects/command-dump-3.raw windows.strings --strings-file /mnt/d/Projects/test.txt --pid 7400                                                                                                                               ─╯
Volatility 3 Framework 2.5.2
Formatting...0.00               PDB scanning finished                        
  |                                             String |            Region |  PID | Physical Address | Virtual Address
* |                               ping techanarchy.net |           Process | 7400 |       0x56ca26b0 |   0x13f547ea6b0
* |                   net user /add admin443 P@ssw0rd! |           Process | 7400 |       0x750d9e50 |   0x13f54521e50        
* |                     echo "hello world" > hello.txt |           Process | 7400 |        0xebda950 |   0x13f54522950   
* |        net localgroup administrators admin443 /add | Unallocated Space |   -1 |       0x92802300 |           0x00            
* |      echo "randomthings" > c:\programdata\AnyDesk\ |           Process | 7400 |       0x66524880 |   0x13f5478d880          
* | CertUtil: -encode command completed successfully.e |           Process | 7400 |       0x66524960 |   0x13f5478d960          
* |        net localgroup administrators admin443 /add |           Process | 7400 |       0x66524b20 |   0x13f5478db20       
* |        net localgroup administrators admin443 /add | Unallocated Space |   -1 |       0x301a06e0 |           0x00
ikelos commented 1 year ago

I've given it a review and it needs a couple of things smoothing over, but just to add here. Matching 2.6 isn't necessarily the goal, the goal is to get something accurate. If 2.6 is accurate, that's fine, but striving to get identical results isn't what we're after if there's something wrong with what 2.6 was doing. 2.6 may be perfect, in which case matching it is awesome, but I don't want to forget that the goal is returning accurate results, not necessarily exactly what 2.6 returns... 5:)

eve-mem commented 12 months ago

Hello all,

I remember looking into this a while ago while trying to make a generic strings plugin and a linux version. I haven't quite finished, but I did come across something similar to this issue. It looks like I didn't make quite enough notes at the time, so I think there is still more to test. I'd normally do more testing before commenting, but given the discussion I thought it might be useful.

I think the issue might be here:

for mapval in proc_layer.mapping(
    0x0, proc_layer.maximum_address, ignore_errors=True
):
    mapped_offset, _, offset, mapped_size, maplayer = mapval
    for val in range(
        mapped_offset, mapped_offset + mapped_size, 0x1000
    ):
        cur_set = reverse_map.get(mapped_offset >> 12, set())
        cur_set.add(
            (f"Process {process.UniqueProcessId}", offset)
        )
    reverse_map[mapped_offset >> 12] = cur_set

Notice that val is not used through the inner loop, but offset is added every time. That means that between the range from mapped_offset to the end of mapped_offset + mapped_size only one address is added rather than one per page. So if the string sits at the start of the mapped range it will find it, but if it is on the second page it wont - because the address of the second page didn't make into into the revmap.

In my work in progress (probably buggy) generic strings plugin I had this. I'm not sure if this actually fixes things properly, my notes are good enough, but this is where I got to trying to look into this.

for mapval in layer.mapping(0x0, layer.maximum_address, ignore_errors=True):
    # print(mapval)
    offset, _, mapped_offset, mapped_size, maplayer = mapval
    for val in range(mapped_offset, mapped_offset + mapped_size, 0x1000):
        # print(val)
        cur_set = reverse_map.get(val >> 12, set())
        cur_set.add((display_name, offset))
        reverse_map[val >> 12] = cur_set

I also renamed the variables returned from the .mapping function to match what it returns there rather than being swapped link

def mapping(
    self, offset: int, length: int, ignore_errors: bool = False
) -> Iterable[Tuple[int, int, int, int, str]]:
    """Returns a sorted iterable of (offset, sublength, mapped_offset, mapped_length, layer)
    mappings.

I'm not 100% sure this is the issue, but hopefully these notes might be useful. I'll still chip away here to see if I can finish my testing and try to remember what I'd worked out at the time. Again I'm not sure if my "fix" is correct - but I think that it's not using val in the inner loop that might be related to the problem.

🦊 just a random internet vol user

eve-mem commented 11 months ago

Hello @kevthehermit and @ikelos,

I've been digging into why the revmap doesn't seem right, and I think this patch here is all that's needed.

Here is a worked example using the win-xp-laptop-2005-06-25.img sample (SHA1: 31cd2e8f1e4f7754da6477f102cbfac2052f878e). If you see here the string The QuickTime Plugin is found at offset 0x1024d20 in pid 2160, and it's at the physical offset 0x1e64cd20.

(layer_name) >>> cp(2160)
(layer_name_Process2160) >>> db(0x1024d20, 20) 
0x1024d20    54 68 65 20 51 75 69 63 6b 54 69 6d 65 20 50 6c    The.QuickTime.Pl
0x1024d30    75 67 69 6e                                        ugin
(layer_name_Process2160) >>> context.layers['layer_name_Process2160'].translate(0x1024d20)
(509922592, 'memory_layer')
(layer_name_Process2160) >>> cl('memory_layer')
(memory_layer) >>> db(509922592, 20) 
0x1e64cd20    54 68 65 20 51 75 69 63 6b 54 69 6d 65 20 50 6c    The.QuickTime.Pl
0x1e64cd30    75 67 69 6e                                        ugin

The strings file used in this example is:

509922592:The QuickTime Plugin

However as you can see it's not found by the strings plugin as it currently is:

$ python vol.py -r pretty -f win-xp-laptop-2005-06-25.img windows.strings --strings-file win.txt
Volatility 3 Framework 2.4.2
Formatting...0.00               PDB scanning finished
  |               String | Physical Address |      Resulty
* | The QuickTime Plugin |       0x1e64cd20 | FREE MEMORY

When applying the patch below it now shows the correct location:

$ python vol.py -r pretty -f win-xp-laptop-2005-06-25.img windows.strings --strings-file win.txt
Volatility 3 Framework 2.4.2
Formatting...0.00               PDB scanning finished
  |               String | Physical Address |                 Result
* | The QuickTime Plugin |       0x1e64cd20 | Process 2160:0x1024d20

I tried to add comments to explain what is happening and I renamed the values returned from the mapping call - just to make it easier to see what exactly they are. Including naming the ones that aren't used.

The problem I think was how later pages in a mapping were not making it into the revmap so they were later not found - hence a lot of 'FREE MEMORY' results. Then when the results were displayed to the user it didn't offset the virtual address into the page and would just show the address for the start of the page where the result was.

Let me know what you think of this. Does this also work on your sample @kevthehermit ?

Once we do come up with a working solution for this issue I'd like to then take it and make a strings plugins that work without an OS, and then linux and mac versions. Just like the yarascan plugins. So it would be good to fix this for the long term, either with something like this or https://github.com/volatilityfoundation/volatility3/pull/1043. I personally don't mind what approach we take. I can make a PR with these changes if it is useful - but I don't want to take away from all the great work that @kevthehermit has put into this!

🦊 just a random internet vol user

diff --git a/volatility3/framework/plugins/windows/strings.py b/volatility3/framework/plugins/windows/strings.py
index 32f3df4c..b8a9d7f8 100644
--- a/volatility3/framework/plugins/windows/strings.py
+++ b/volatility3/framework/plugins/windows/strings.py
@@ -18,7 +18,7 @@ vollog = logging.getLogger(__name__)
 class Strings(interfaces.plugins.PluginInterface):
     """Reads output from the strings command and indicates which process(es) each string belongs to."""

-    _version = (1, 2, 0)
+    _version = (2, 0, 0)
     _required_framework_version = (2, 0, 0)
     strings_pattern = re.compile(rb"^(?:\W*)([0-9]+)(?:\W*)(\w[\w\W]+)\n?")

@@ -84,8 +84,18 @@ class Strings(interfaces.plugins.PluginInterface):
         for offset, string in string_list:
             line_count += 1
             try:
+                # calculate the offset for this string within a 4096 page so
+                # that this offset can be added to mappings which are all
+                # page aligned. This ensures that a string located at phy
+                # add 0x1e64cd20 would carry the 0xd20 to the virtual offsets
+                # displayed in the plugin output. Without this it would show
+                # only the page that the string was found, rather than the
+                # actually addr. 0xFFF is 4095 e.g. all lower bits set.
+                offset_within_page = offset & 0xFFF
+
                 revmap_list = [
-                    name + ":" + hex(offset) for (name, offset) in revmap[offset >> 12]
+                    name + ":" + hex(virt_offset + offset_within_page)
+                    for (name, virt_offset) in revmap[offset >> 12]
                 ]
             except (IndexError, KeyError):
                 revmap_list = ["FREE MEMORY"]
@@ -147,14 +157,40 @@ class Strings(interfaces.plugins.PluginInterface):
         if isinstance(layer, intel.Intel):
             # We don't care about errors, we just wanted chunks that map correctly
             for mapval in layer.mapping(0x0, layer.maximum_address, ignore_errors=True):
-                offset, _, mapped_offset, mapped_size, maplayer = mapval
-                for val in range(mapped_offset, mapped_offset + mapped_size, 0x1000):
-                    cur_set = reverse_map.get(mapped_offset >> 12, set())
-                    cur_set.add(("kernel", offset))
-                    reverse_map[mapped_offset >> 12] = cur_set
+                (
+                    virt_offset,
+                    _virt_size,
+                    phy_offset,
+                    phy_mapping_size,
+                    _phy_layer_name,
+                ) = mapval
+
+                # for each page within the mapping we need to store the phy_offset and
+                # the matching virt_offset
+                for offset_to_page_within_mapping in range(0, phy_mapping_size, 0x1000):
+                    # calculate the page number for this phy_offset, e.g. the ">> 12"
+                    # drops the bits that would address an offset within the page.
+                    # This means that all offsets within the same page get the same
+                    # physical_page number.
+                    physical_page = (
+                        phy_mapping_size + offset_to_page_within_mapping
+                    ) >> 12
+
+                    # get the existing mappings for this physical page from the
+                    # reverse map set.
+                    cur_set = reverse_map.get(physical_page, set())
+
+                    # add a mapping for this virtual offset, taking care to add the
+                    # offset_to_page_within_mapping to ensure that all pages match correctly.
+                    # Without this the 2nd, 3rd etc pages would all incorrectly map to the same
+                    # virtual offset.
+                    cur_set.add(("kernel", virt_offset + offset_to_page_within_mapping))
+
+                    # store these results back in the reverse_map
+                    reverse_map[physical_page] = cur_set
                 if progress_callback:
                     progress_callback(
-                        (offset * 100) / layer.maximum_address,
+                        (virt_offset * 100) / layer.maximum_address,
                         "Creating reverse kernel map",
                     )

@@ -178,22 +214,35 @@ class Strings(interfaces.plugins.PluginInterface):

                     proc_layer = context.layers[proc_layer_name]
                     if isinstance(proc_layer, linear.LinearlyMappedLayer):
+                        # this follows the same pattern as the kernel mappings above.
                         for mapval in proc_layer.mapping(
                             0x0, proc_layer.maximum_address, ignore_errors=True
                         ):
-                            mapped_offset, _, offset, mapped_size, maplayer = mapval
-                            for val in range(
-                                mapped_offset, mapped_offset + mapped_size, 0x1000
+                            (
+                                virt_offset,
+                                _virt_size,
+                                phy_offset,
+                                phy_mapping_size,
+                                _phy_layer_name,
+                            ) = mapval
+                            for offset_to_page_within_mapping in range(
+                                0, phy_mapping_size, 0x1000
                             ):
-                                cur_set = reverse_map.get(mapped_offset >> 12, set())
+                                physical_page = (
+                                    phy_offset + offset_to_page_within_mapping
+                                ) >> 12
+                                cur_set = reverse_map.get(physical_page, set())
                                 cur_set.add(
-                                    (f"Process {process.UniqueProcessId}", offset)
+                                    (
+                                        f"Process {process.UniqueProcessId}",
+                                        virt_offset + offset_to_page_within_mapping,
+                                    )
                                 )
-                                reverse_map[mapped_offset >> 12] = cur_set
+                                reverse_map[physical_page] = cur_set
                             # FIXME: make the progress for all processes, rather than per-process
                             if progress_callback:
                                 progress_callback(
-                                    (offset * 100) / layer.maximum_address,
+                                    (virt_offset * 100) / proc_layer.maximum_address,
                                     f"Creating mapping for task {process.UniqueProcessId}",
                                 )
kevthehermit commented 11 months ago

Nice work 😁 I can test your changes against my samples when I get home but testing a random sample I have here looks accurate to me.

I can add those changes in to my PR, of if you want to open your own my only suggestion would be to add some extra cols to the output so we have string, Location (Process| Kernel | Free), PID, Virt, Physical instead of a concatenated string like Process 2160:0x1024d20

I like the idea of a global strings plugin that is OS-independent as well

eve-mem commented 11 months ago

Yes - I'm easy on changing the output columns - either works for me. If other plugins want to they can work from the generate_mapping classmethod so they can work off the dictionary rather than the pretty string that is prepped for the display. I dont mind (nor do I think my view matters too much) about what columns we have the results so long as everyone gets the information they need.

Looking over your PR more closely @kevthehermit it looks like we've basically ended up fixing the same bits in different, but similar, ways really.

If it's helpful I figured I'd make a diagram and explanation as to how rev_map is working.

Imagine this mini mem sample with two processes.

          Physical Mem                                                                                                                                                                      
          0x0                  0x1000               0x2000               0x3000               0x4000               0x5000               0x6000               0x7000               0x8000    
          +--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+         
          |  Physical Page 0   |  Physical Page 1   |  Physical Page 2   |  Physical Page 3   |  Physical Page 4   |  Physical Page 5   |  Physical Page 6   |  Physical Page 7   |         
          +--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+--------------------+         
                |                  |                                           |                    |         |                                       |                   |                 
                |                  |                                           |                    |         |                                       |                   |                 
                |                  |                        +------------------+  +-----------------+         +------------------+                    |                   |                 
                |                  |                        |                     |                                              |                    |                   |                 
 Process A      |                  |                        |                     |                                  Process B   |                    |                   |                 
 0xf0000        |     0xf1000      |       0xf2000          |   0xf3000           |   0xf4000                        0xa0000     |        0xa1000     |        0xa2000    |         0xa3000 
 +--------------------+--------------------+--------------------+--------------------+                               +--------------------+--------------------+--------------------+       
 |   Virtual Page 0   |   Virtual Page 1   |   Virtual Page 2   |   Virtual Page 3   |                               |   Virtual Page 0   |   Virtual Page 1   |   Virtual Page 2   |       
 +--------------------+--------------------+--------------------+--------------------+                               +--------------------+--------------------+--------------------+       

That works out to this mapping:

Physical page 0 = Process A page 0 Physical page 1 = Process A page 1 Physical page 2 = Free Memory Physical page 3 = Process A page 2 Physical page 4 = Process A page 3, and Process B page 0 Physical page 5 = Free Memory Physical page 6 = Process B page 1 Physical page 7 = Process B page 2

So a call to layer.mapping on Process A would give you: [(0xf0000, 0x2000, 0x0, 0x2000, 'memory_layer'), (0xf2000, 0x2000, 0x0, 0x3000, 'memory_layer')]

and a call for Process B would give you: [(0xa0000, 0x1000, 0x4000, 0x1000, 'memory_layer'), (0xa1000, 0x2000, 0x6000, 0x2000, 'memory_layer')]

The "bug" is that when the reverse map was being made in the past both when a mapping spans multiple pages the rev map didn't have that information in it.

So under Physical page 0 in the rev map you'd get Process A page 0, but under Physical page 1 you would not have anything.

If the string you wanted is in Physical page 1 it would come back as Free Memory because of this.

(Edit: to add the rev map)

For the rev map to work it would need to come out as below. Note that physical pages 2 and 5 don't exist, and 4 has the mappings for both processes.

revmap = {
0: {('Process A', 983040)}, 
1: {('Process A', 987136)}, 
3: {('Process A', 991232)}, 
4: {('Process A', 999424), ('Process B', 655360)}, 
6: {('Process B', 659456)}, 
7: {('Process B', 663552)}
}

Next when the results came to be displayed the offset shown to the user would be the start of page rather than the exact location. It would read from the rev map and show the address for the start of the page. e.g. for a string in Process B at 0xa2123 the result would show 0xa2000. We know the offset into the page from the matching physical address 0x7123. If we mask the physical address to get the lower bits we can then add that to the virtual offset to get the exact location.

This all means that the rev map gets quite large - every single page for every process + kernel that maps to a physical page will have an entry.

It might be worthwhile just testing how quickly we could get results by not trying to build a revmap at all, and just iterate through all the strings we're searching for and go through each mapping per process once to see if there is a hit. I suspect it's a trade off depending on the number of strings you're looking for, smaller number of strings might favour just looping through all the mappings to find results - while a large number of strings might favor the pre computed revmap.

kevthehermit commented 11 months ago

I have updated https://github.com/volatilityfoundation/volatility3/pull/1043 with a combination of both sets of changes.

The revmap calculations and comments from @eve-mem and the extra cols in the output yields

github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 200 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 60 days since being marked as stale.

ikelos commented 2 months ago

I believe @eve-mem and/or @kevthehermit are still working on improving this? There is a far more efficient version, but I'm not sure whether it solves your issue (and it hasn't yet been put forward as a PR, although you can find a branch with it in under feature/ever-string-speedup...

eve-mem commented 2 months ago

@ikelos the faster version should also fix this issue. Sorry it's taking me so long to get around to finishing it.

ikelos commented 2 months ago

No problem, we're all volunteers here, just trying to make sure it doesn't get buried under dust... 5;P

eve-mem commented 1 month ago

@ikelos thanks for understanding. I don't want to forget it either.

@atcuno just a quick FYI for vol2 parity this needs to be fixed too. Currently strings will miss a lot of results.

I'm working on it, but I don't think I'll be done in 2 weeks. If someone else wants to fix it I'm okay with that.

This comment explains what's going on: https://github.com/volatilityfoundation/volatility3/issues/876#issuecomment-1841189587

There was a user in slack recently that was struggling with strings and these changes did help.

atcuno commented 1 month ago

Ok two things:

1) I put the parity-release label on. Just put that label on any that would be needed for it, like actual plugins or known bugs that need to be fixed.

2) The two weeks is going to be missed at this point, but Gus, myself, and 1-2 others at Volexity are putting in basically full time over the next two weeks to vol3 so I am using the label and organization to make sure our time is efficient.