ufrisk / MemProcFS

MemProcFS
GNU Affero General Public License v3.0
3.04k stars 372 forks source link

Question about pagefile / vads / Mem compression #103

Closed imerzan closed 2 years ago

imerzan commented 2 years ago

Hi Ulf- sorry to bother you. I noticed that if I get a VAD Map for a process that I am reversing, that sometimes an entry may be backed by the system page file. Oftentimes attempting to read from these locations results in failures which is really hard to work around.

Sometimes I am dealing with high memory pressure, and using the -pagefile parameter in Init() is quickly out of date since it seems to be a one time grab, and I often need to do lengthy operations.

I've tried disabling my system page file but while that works -- also leaves me susceptible for system crashes when the system tries to allocate more than it can chew.

Is there a way to use my knowledge that when a certain VA range is backed by the page file, to parse the page file directly from the system? Can I do this using Vfs?

Thank you for all of your hard work and knowledge on these matters.

imerzan commented 2 years ago

I increased the verbosity and it looks like the entries that fail give me this:

one example:

MmWin_CompressedPage: FAIL: #11 BTreeSearch
  va= ffffb181712f7000 ep= 0000000000000000 pgk=20006bd5 ism=0000 vas=0000000000000000
  pte=00004bd500002084 oep=0000000000000000 rgk=00000000 pid=0004 vat=ffff9d8ef127e000
  pgr=0000000000000000 rgn=0000000000000000 rgo=00000000 cbc=0000 rga=0000000000000000

It does appear some paged-to-disk entries do succeed.

imerzan commented 2 years ago

So in my test instance If I load symsrv.dll and dbghelp.dll during my Init() it seems to handle these compressed pages OK, and I no longer get the error in the post above.

99% of the errors are gone, I just have this occasionally:

[CORE]     VmmProc: Start periodic cache flushing
[PDB]      Initialization of debug symbol .pdb functionality completed
[PDB]      [ srv*C:\Users\redacted\Symbols*https://msdl.microsoft.com/download/symbols ]
[VMM]      MmWin_CompressedPage: FAIL: #35 ChunkArrayTooLarge
[VMM]        va= ffffb1816d250000 ep= 0000000000000000 pgk=20285558 ism=0000 vas=ffff9d8ee95af000
[VMM]        pte=0028555800002084 oep=0000000000000000 rgk=0019e67a pid=0004 vat=ffff9d8ef7e18000
[VMM]        pte=0000000000000000 oep=0000000000000000 rgk=00000000 pid=0000 vat=0000000000000000

But not sure if it will be a major problem. So far I let my test instance run for ~5 minutes and didn't see anything to worry about.

I did some further testing and it seems a lot better.

I guess my final question is -- now that it seems to handle these pages OK, will this continue to work for a lengthy program runtime, or is it possible things will go stale? I assume these locations get refreshed during the internal refresh?

imerzan commented 2 years ago

Follow up on above, starting to notice problematic behavior again.

Now getting spammed hundreds of times with: MmWin_CompressedPage: FAIL: #35 ChunkArrayTooLarge

Disabled Memory Compression Disable-MMAgent -MemoryCompression

And it now seems to be behaving properly again... Is there an issue with Memory Compression in FPGA Firmware 4.7 ? Was using the latest MemProcFs binaries.

ufrisk commented 2 years ago

Hi, apologies for being slow answering. I've been on a few days vacation and also my mail program started to put the Github notifications in the spam folder, so I didn't see this post up until now :(

Also huge thanks for the sponsorship!

Is there a way to use my knowledge that when a certain VA range is backed by the page file, to parse the page file directly from the system? Can I do this using Vfs?[/b]

Also if you're willing to inject kernel shellcode I guess it should be possible to instruct this shellcode to read the specific missing memory page - which should then force the OS to put it in live memory. I haven't implemented support for this though.


I don't see how the FPGA firmware version should be able to affect memory compression only. If there is a read error it will affect it. Also the memory manager is in a very fast flux so chances are that things have changed when the comparabily slow FPGA tries to read it.

I haven't looked terribly much into this as of late, but when I've done my testing I've done it on the VM Snapshot method to get a perfect copy of the pagefile (i.e. no memory drift/smear which will degrade analysis of compressed memory very quickly).

Please let me know how it goes and if you find anything more. I'll be faster answering this time.

imerzan commented 2 years ago

Hi Ulf - No problem on the response, fully understand :) Thanks for getting back.

So I was able to get a Vad map of one of the processes, and I checked the property fPageFile for true. Then proceeded to read memory out of those ranges. It seems like it almost all works, and the reads are up-to-date (checked this). So it does seem like MemProcFS is reading disk paged memory.

It looked like it ran into problems though when that memory was compressed (based on the errors below) MmWin_CompressedPage: FAIL: #11 BTreeSearch MmWin_CompressedPage: FAIL: #35 ChunkArrayTooLarge

This was largely fixed by applying Disable-MMAgent -MemoryCompression on my target system, and using microsoft debug symbols -- all the errors went away on my end and everything is now fine :)

However, I did have a colleague who was still having problems, and I recommended disabling compression (Which we verified) and he kept getting this error during his init() MmWin_MemCompress_InitializeVirtualStorePageFileNumber_Old: WARN! did not find virtual store number - fallback to default. Same FPGA device / 4.7 firmware

I noticed we were using a slightly older version of the MemProcFS codebase, so I updated the following binaries to your latest 4.9 release:

  1. vmm.dll
  2. leechore.dll
  3. symsrv.dll
  4. dbghlp.dll
  5. vmmsharp.cs (we're using a C# project)

After updating the binaries his MemCompress error went away, and is now working too. So largely we have everything fixed now, and things are a bit more stable with the PageFile enabled, we won't crash & burn during analysis when we have a bunch of things running.


Lastly - I noticed you updated your Scatter Read API (and implemented into Vmmsharp). I was able to test this and had a bit of feedback. I followed your implementation on Init / Prepare / Execute / Read / Close (which was kind of similar to my own wrapper I had made). And I can confirm that it does indeed work, and I didn't encounter any errors really. However, even with FLAG_NO_CACHE i noticed that memory acquisition was a bit slower/out of date compared to the old Scatter Read API, and it was being a bit problematic (but not outright unusable). I am acquiring a handle on each Scatter Read call, and it's being used across 2-3 threads (each thread has it's own handle). I didn't see any thread safety problems here in a different test environment I did -- not sure why it was acting up.

I went back to my old Scatter Read Wrapper that uses vmm.ReadScatter, and comparatively it ran much faster/efficiently. My wrapper does the following:

  1. Accepts an array of ScatterReadEntries
  2. Loops the array and checks the memory address, and size of the read to perform. I perform page alignment, and check if the entry spans two or more pages, and then add those pages to a HashSet<ulong> (so the pages are only included once, sometimes I have quite a bit of overlap).
  3. After getting all the pages I need to read, I call .ToArray() on the hashset and pass the pages to execute the scatter read.
  4. I re-loop my array of ScatterReadEntries, and then pull up the proper page from the array of MEM_SCATTER structures. I then pull out what I need from the page(s) and construct my result/buffer to pass back to the caller. I actually changed MEM_SCATTER.pb to Memory<byte> since we are using .NET6 and slicing works really well here.

The above works very efficiently and very well. So at the very least, please do not ever deprecate/obsolete the vmm.ReadScatter() !! It works very well. 😄

ufrisk commented 2 years ago

It's really nice to see you managed to resolve the erros at large. Huge thanks for the update 👍

About the new Scatter API. It's working really well on C/C++. It's just basically a wrapper API around the VMMDLL_ReadScatter function. In C# I'm not surprised it's a bit slower than doing it all in C#. The P/Invoke adds some overhead and you'll get many more API calls using the vmmsharp ReadScatter API rather than something that's implemented in C# acting as a wrapper around the VMMDLL_ReadScatter.

I have no plans to ever remove that function. No worries here.

I'm however in the midst of a large rewrite. Up until now you've only been able to analyze one system at a time in a process. In the new release I'll add multi-system analysis. This also mean that I changed the C/C++ API functions to include a "VMM_HANDLE" in all calls.

In C# how would you prefer to me to go about it. Should I just add the VMM_HANDLE parameter to every function. Or should I make it so you'd need to instantiate / new a Vmm object (which will handle the handle internally) and then have the methods/functions on this object instead? I.e. should I keep it C-ish or objectify it?

imerzan commented 2 years ago

That is really neat. I am guessing that would apply more to using a VM(s)/Driver route?

I think either way is honestly fine. It's already pretty C-ish as is, and I currently wrap everything to how I need it. Having a handle parameter for every method isn't something I'm thrilled about, but isn't the end of the world either.

Will there be multiple Init() calls needed for each system? Or is there a single Init() call for all systems?

I would say having the api similar to this would probably be the most clean for C# in my opinion:

_vmmInstance = new vmm(); // Handle is internal/private
_vmmInstance.Initialize(params); // Initialize this instance only? Could even wrap this in the constructor i suppose?
_vmmInstance.PidGetFromName("process", out _pid);
_vmmInstance.MemRead(_pid, addr, size);
_vmmInstance.Close(); // Close down only this instance.

It could also be really nice to have vmm implement IDisposable interface if we go the above route. That way it could wrap vmm.Close() (should probably still be able to call this manually though), and cleanup unmanaged resources internally. Calling code could utilize using or Dispose.

{
    using var mem = new vmm(); // will cleanup unmanaged resources/handles when it goes out of scope below, useful!
    mem.Dispose(); // Or manually cleanup!
}

If this is going to be excessively complicated, you can just keep it C-Like. It's really not hard to wrap your code at all in it's current form, and your API is very clean/easy to follow 😄

It may also be good to leave it C-Like since maybe not everyone wants to approach things in an object-oriented manner and want to code procedurally (although I'd find this odd on C# of all languages). On top of this, changing to an object-oriented approach would require other vmmsharp-users to re-work their code a bit more extensively, so another argument for leaving it C-Like.

Just some random thoughts, I won't lose sleep whichever way you do it :)

ufrisk commented 2 years ago

Thanks, I'll go with the constructor approach and keeping the handle internally.

It will be something like the below as you suggested.

_vmmInstance = new vmm("-device", "fpga");

I'll add the Dispose as well as per your suggestion. Other than that it will stay unchanged pretty much.

And yeah, you'd need to initialize a new vmm object for each system you'd want to analyze. It won't be possible to analyze two instances using the same fpga (not much reason why you'd want to do that anyway).

imerzan commented 2 years ago

Very cool! Looking forward to the new features :) thank you