What victim settings can prevent a PCILeech DMA attack from succeeding? Victim(Dell R815 Server) FPGA(Netv2)

cgmAether commented 3 years ago

Hi Ulf,

I would first like to thank you for the amazing repository and all the hard work you've done.

I have a Netv2 flashed with the v4.7 prebuilt binary that I can ping and send commands.

My victim machine is a Dell R815 server that can dual boot into Ubuntu version 16.04 LTS with Linux Kernel version 4.15.0 or Windows 10 ntfs.sys file version 10.0.18362.145 that was digitally signed May 21, 2019. The hard drives for this machine are in a Raid 10 configuration. Previously in the BIOS under Processor Settings, 'Virtualization Technology', 'DMA Virtualization', and 'Execute Disable' were all enabled. I have disabled these all currently.

I have tried to inject a respective Kernel into the server while it was booted into Ubuntu and then also while it was booted into Windows but the Server crashes just before PCILeech starts checking that the Kernel was injected successfully. Even just doing a raw memory dump the server crashes very often. The only way I can get it to dump without crashing is starting the memory dump as the server is booting.

I know you have previously stated that these BIOS Virtualization settings can be used as a defense against PCILeech. Does a RAID configuration count as Virtualization in this context and break PCILeech? Does having a Dual OS system where the OS system partitions are not all at the bottom of memory break PCILeech?

I have a second Dell R815 Server that only has Windows installed on it, but I do not have the password to log in to this machine. I would be unable to change any settings beyond the BIOS ones.

If you could direct me to some possible troubleshooting routes or BIOS settings that I missed I would greatly appreciate it. Thanks!

ufrisk commented 3 years ago

DMA is working.

Most likely reason for the crash is that PCILeech tries to read physical memory at an address to which some device is memory mapped; a device or firmware that is sensitive to memory reads and crashes; freezes the server. This is unfortunately a somewhat common problem.

The solution is to avoid this; PCILeech supports you being able to provide a "memory map" as a file; or auto detect it.

Some things you may try out is:

start reading memory for the kmd inject at 4GB; problematic memory is most often (but not always) located between 3-4GB. If the server has plenty of memory it's likely that the kernel will be randomized to memory above 4GB. pcileech kmdload -kmd LINUX_X64_48 -min 0x100000000 -device rawudp://ip=...
inject into Windows with the generic (non ntfs-based) win10_x64_3 signature and also try to auto-detect the memory map. Please note that this only works if running Windows on the attacker computer since it makes use of my Windows-only memory analysis library; also this will print verbose info such as the memory map (if detected properly) pcileech.exe kmdload -kmd WIN10_X64_3 -memmap auto -v -vv -device rawudp://ip=... You man then save the memory map to a file and use it to subsequent calls to pcileech both for windows and linux. The memory map is static across reboots; but chances are it will differ between servers unless they are identical with regards to cpu, motherboard and firmware version.

Please let me know if there are any progress with regards to this.

cgmAether commented 3 years ago

My host machine that communicates with the FPGA is also running Windows 10 so I'm good there. Running this command results in the server crashing, but it does seem like the memap command finishes executing. From my understanding this command, probes the memory addresses and prints out the memory map, should also output that information to a text file (it did not do that), and then attempts to inject the kmd using the memory map that it found. Please let me know if I am misunderstanding any part of this.

How can I save the memmap output to a file? I assume I am misordering my arguments here.

Can I get the memmap without immediately injecting the kmd after? My thought process is this might be more stable doing the process in two parts, but again I may be misunderstanding what the memap or kmdload commands actually do.

Thank you for the quick response!

cgmAether commented 3 years ago

Here is the Ubuntu injection. It also crashes before the inject is complete it seems.

On the Ubuntu side I've come very close a few times. Getting to 98% on the scan before the server crashes. I'm not sure how much is up to chance in this regard.

I have also gotten the Bad PCIE TLP error 'this should not happen' a few times, but it seems to happen primarily right when the server crashes.

ufrisk commented 3 years ago

This looks like PCILeech is crashing or having issues; if it fails it should output something like FAILED before quitting. Here I just see you exit to the command prompt.

I haven't really been testing PCILeech on large memory systems; it may be related to that; I have a 256GB system on hand (also AMD) that I could try my luck on in the weekend when I have time. I don't have access to a 320GB system tho... AMD is usually very sensitive when it comes reading in the wrong places so it's probably why the server is crashing.

When the inject completes for Windows it should look something like this:

my guess is that there are two things at play here; 1st: server crashes for reading in the wrong places. 2nd pcileech crashing / instabilities because of your server crashing; or just because there is some other error condition for large memory system.

You could also try out my MemProcFS system to see if you're able to "mount" the memory of the target system. https://github.com/ufrisk/MemProcFS 1) install dokany virtual file system from (DokanSetup_redist.exe): https://github.com/dokan-dev/dokany/releases/tag/v1.4.0.1000 2) unzip and run: memprocfs.exe -device rawudp://ip=192.168.0.222 -v -vv

if you're able to mount your windows server you'll find the memory map at: M:\sysinfo\memory\physmemmap.txt ; if not it may allow me to better try to understand where it fails.

but judging from your settings a very crude memory map may be:

0000         1000 -        9ffff
0001       100000 -     bfffffff
0002    100000000 -    201effffff

pcileech.exe kmdload -kmd win10_x64_3 -v -vv -device rawudp://ip=192.168.0.222 -memmap memmap.txt (most stable) pcileech.exe kmdload -kmd win10_x64_2 -v -vv -device rawudp://ip=192.168.0.222 -memmap memmap.txt (next best try) pcileech.exe kmdload -kmd win10_x64_1 -v -vv -device rawudp://ip=192.168.0.222 -memmap memmap.txt

if it's still not working I really have to do some testing on my large memory system; but during the weekend unfortunately.

cgmAether commented 3 years ago

So I have dokany virtual file system installed and memprocfs. Running the given command results in a very similar result. PCILeech crashes and exits back to the terminal mid execution as the server crashes.

Running the three listed commands with the crude memory map these are my results in the same order.

pcileech.exe kmdload -kmd win10_x64_3 -v -vv -device rawudp://ip=192.168.0.222 -memmap memmap.txt

pcileech.exe kmdload -kmd win10_x64_2 -v -vv -device rawudp://ip=192.168.0.222 -memmap memmap.txt

pcileech.exe kmdload -kmd win10_x64_1 -v -vv -device rawudp://ip=192.168.0.222 -memmap memmap.txt

For this last one the server does not crash when running the command.

Is there any way to access the physical memory map from within Windows itself? I assume this is something that is not possible in user mode. Similarly is there any way to access the physical memory map from within Ubuntu? This machine has 4 600GB hard drives in a Raid 10 config so its 1.2 TB of storage total. Would providing the partition mapping of the drives help in any way? If you need me to provide any machine specs or anything else that would help please let me know.

Thank you for the quick responses again. I appreciate the help.

ufrisk commented 3 years ago

Yes, you may access memory directly from Windows; the MemProcFS; you can either start it as "administrator" together with WinPMEM which will load a kernel driver; or run "DumpIt" to dump memory to a file (DumpIt will also load a kernel driver for this); or even run DumpIt in live mode.

If dumping memory you may mount the memory dump with MemProcFS and check out the memory map afterwards. I parse it from the registry key HKLM\HARDWARE\RESOURCEMAP\System Resources\Physical Memory\.Translated (you may also be able to look at this in registry editor). But this is a binary value that needs to be parsed separately. If you can send me the binary sequence I may be able to help.

Disk configuration does not matter for PCILeech.

Seeing this is an opteron server (which I have not tested) it would be interesting if you could try to run the commands below and post the results here or link to the files. It will help me understand a bit better about the opteron system. I don't think algo=1 will help with your issues; but it's worth a try. pcileech.exe pagedisplay -min 0x1000 -v -vv -vvv -device rawudp://ip=192.168.0.222 pcileech.exe pagedisplay -min 0x1000 -v -vv -vvv -device rawudp://ip=192.168.0.222,algo=1

Otherwise I'll also have to look into why PCILeech is crashing when your server crashes. As mentioned I'll do some testing on an EPYC server with larger memory size as well this weekend; but it may differ from the Opteron...

cgmAether commented 3 years ago

So here are the results of the page display commands you listed at the bottom. Neither of these commands crashed the machine.

pagedisplay.txt pagedisplayalgo1.txt

Here is me running DumpIt with MemProcFS on the server Running

And here are the files found at that location, M:\registry\HKLM\HARDWARE\RESOURCEMAP\System Resources\Physical Memory Physical_Memory.zip

Let me know if there is anything else that could be useful for you that I can get through MemProcFS and/or DumpIt.

ufrisk commented 3 years ago

great, it works, you'll find the memory map in M:\sysinfo\memory\physmemmap.txt

0000         1000 -        9dfff
0001       100000 -       101fff
0002       103000 -     df678fff
0003    100000000 -   201effffff

save/copy this to a file and then run pcileech/memprocfs with -memmap physmemmap.txt option and things should hopefully work better. or is it still crashing? will look into the NeTV2 and larger memory sizes on AMD this weekend if needed.

cgmAether commented 3 years ago

Hi Ulf,

Sorry I was out of the office Thursday and Friday for the holidays.

In my previous comment, I was running MemProcFS from within the Operating System running on the Server. I was not using PCILeech or the NetV2.

If I run PCILeech with the physical memory map listed in your previous comment, the server still crashes and PCILeech breaks to the command line. I double checked the physical memory map today by again running MemProcFS from within the Server OS and looked at M:\sysinfo\memory\physmemmap.txt. I can confirm that the memory map you listed in your previous comment and the memory map at that file path are the same.

ufrisk commented 3 years ago

thanks for the feedback; I fixed an issue related to the memmap functionality two days ago; if you haven't downloaded the new PCILeech/MemProcFS binaries can you please retry with them?

cgmAether commented 3 years ago

So, I have reflashed the Netv2 with the latest binary and downloaded the lastest PCILeech Binary, MemProcFS Binary, and installed the newest Dokan Binary.

I tried the following two commands and both resulted in the server crashing and PCILeech breaking to the command line. pcileech.exe kmdload -kmd win10_x64_3 -v -vv -device rawudp://ip=192.168.0.222 -memmap physmemmap.txt

memprocfs.exe -device rawudp://ip=192.168.0.222 -v -vv

I get the same result when trying to use MemProcFS with the memory map. memprocfs.exe -device rawudp://ip=192.168.0.222 -v -vv -memmap physmemmap.txt

ufrisk commented 3 years ago

Thanks for running these tests. It seems like I'll have to look into this myself to see if I can replicate the issues. It's going to be a bit tricky for me since I have the dev environment on the server I have to test this on. I'll try to see if I can make something out of this though. I'll keep you updated once I know something more. Once again thanks for reporting this.

cgmAether commented 3 years ago

Thanks for the help, hopefully we can find the problem.

cgmAether commented 3 years ago

I have removed some of the RAM from the server. It is now a 64 GB Machine. Would the physical memory map remain the same? I am still experiencing crashes so it may just be an issue related to it being an AMD machine.

ufrisk commented 3 years ago

I tried this on my SuperMicro AMD EPYC 7302p w/ 256GB RAM. Windows Server2019. It works perfectly without any issues with the -memmap auto. The command I run is: MemProcFS.exe -device rawudp://ip=192.168.1.157 -memmap auto with the most recent MemProcFS release coupled together with the most recent NeTV2 pcileech-fpga firmware. PCILeech also works.

It may very well be some issues with older non zen based systems (bulldozer?); or something related to that particular server motherboard/firmware. It's really hard/impossible for me to look into this without being able to replicate the issue unfortunately. I don't think I'll be able to help with this.

cgmAether commented 3 years ago

Damn. I'm still having issues with the newest version of MemProcFS. I'm going to try updating the bios and all firmware on the server. Maybe that will help.

The second R815 Server we have is running Windows Server 2012 R2. I reset the password on it today and attempted to get the PhysicalMemory Mapping with DumpIt and MemProcFS, but it would not work on that server. I was getting a "too many memory segments in crash dump file" error.

We purchased a third server yesterday, a Dell PowerEdge R720 that has Intel Processors. It should arrive next Monday and I can do some testing on that.

I hope that it is a Processor issue and not something else related to the Dell PowerEdge Series Specifically. I appreciate your efforts. I will update you if I get any results or breakthroughs.

ufrisk commented 3 years ago

I'm very much looking forward to the details :) I'm guessing it's most likely something related to the AMD systems possible also related to my code. My guess is that the Intel system will be fine; but it will be interesting for me to know as well.

cgmAether commented 3 years ago

Just for history. I tried moving the Netv2 around all the different PCIE slots, with the board in the PCIE Riser, with a PCIE extension cable in the riser. Just about every configuration. I also updated the bios and all firmware I could find, but no luck.

cgmAether commented 3 years ago

So the R720 server arrived today. It did not come with hard drives so here's what I've done.

Using HDD's with Windows Server 2012 R2.
- I can use MemProcFS on the Server through the Netv2 FPGA successfully.
- When trying to use PCILeech to inject a kernel
  - Only pcileech kmdload -kmd win10_x64 injects the kernel at an address. The other kmds: win10_x64_1, win10_x64_2, & win10_x64_3 all return pcileech errors in the command prompt.
  - Attempting to mount the address crashes the server with a nmi_hardware_failure
Using HDD's with Dual Boot Ubuntu & Windows 10
- I can use MemProcFS on the Server through the Netv2 FPGA successfully when booted into Windows.
- When trying to use PCILeech to inject a kernel on Windows boot
  - Both pcileech kmdload -kmd win10_x64_3 & pcileech kmdload -kmd win10_x64_2 succesfully inject the kernel at an address.
  - Attempting to mount the address blue screens the server.
- When trying to use PCILeech to inject a kernel on Ubuntu boot
  - Using pcileech kmdload -kmd LINUX_x64_48 it returns Code inserted into the kernel, Execution received, & Failed. Failed to retrieve physical memory map. The server does not crash in this case.

There did not appear to be any difference from using the physical memory map and auto on the Windows injection attempts. There also did not seem to be any difference between the regular kmdload vs kmdload -min 0x100000000.

Questions

How can I get the Ubuntu physical memory map? Any other suggestions on things to try?

Some Background

We are using Electromagnetic Side Channels to identify hardware trojans. Relating to PCILeech, we are trying to use EM Side Channels to look at PCIe traffic through the PCIe port, bridge, cable, etc. We just want to look and see if we can tell the difference between a non-malicious FPGA using PCIe & PCILeech attacking a system. What part of PCILeech has the most PCIe traffic? I assume that copying files over from the PCILeechFileSystem, once mounted, would result in the most traffic, but if there is anything else that creates a lot of traffic since that is what we are mainly after, please let me know.

ufrisk commented 3 years ago

It's strange that the kernel implant works, but then crashes the server when the pcileech file system is mounted. I don't know why this is. Have you been able to run the file system at all?

I'll assume you use the pcileech mount functionality and not MemProcFS memory analysis mount when you say mount.

About 2012R2; it may be that some of the kernel modules aren't working against it; I haven't really tested against Windows Server; but it should in theory not be any different from Windows 8.1. I can test against it tomorrow just to have a quick look.

About Ubuntu; I would need exact version to be able to look into this.

Reading memory will make a lot of traffic in general. Just loop read a 16MB segment I guess, for example loop: pcileech.exe -min 0x100000000 -max 0x101000000 dump -out none

If wanting to generate lots of traffic the NeTV2 is really the wrong device though; since it's only connected over a 100Mbit ethernet with a tiny tiny bandwidth compared to PCIe. One of the USB devices would be more suitable if wanting to generate lots of PCIe traffic.

cgmAether commented 3 years ago

Hi Ulf,

I think I'm going to close this issue as the info you provided in your last post about PCIe traffic gave us enough for our use case. We have been doing basic memory dumps and recording the signals emmanated by the PCIe ribbon cable.

For any others who stumble upon this in the future the Dell Poweredge R815 servers seem to have issues with PCILeech and the Netv2 specifically. I'm not sure about other FPGA's. The Dell Poweredge R720 works better and has some of the PCILeech functionality, but not full capability.

Thanks for the help Ulf!

ufrisk commented 3 years ago

Thanks for letting me know that you got it to work on another server. I'm aware that there are issues with some hardware, but as I mentioned it's very hard for me to test things like this fully, and also I unfortunately lack the time since this being a hobby project only for me.

Also, if you should find PCILeech / MemProcFS useful please consider sponsoring the project here on Github. I see people purchasing hardware for hundreds of dollars (of which I receive absolutely zero dollars of) just to be able to run my free open source software. Sponsorships go for as little as $2 and Github is matching it - a $2 sponsorship for you is a $4 sponsorship for me. Thank You 💖

BG7YWL commented 1 year ago

Hi Ulf,

I think I'm going to close this issue as the info you provided in your last post about PCIe traffic gave us enough for our use case. We have been doing basic memory dumps and recording the signals emmanated by the PCIe ribbon cable.

For any others who stumble upon this in the future the Dell Poweredge R815 servers seem to have issues with PCILeech and the Netv2 specifically. I'm not sure about other FPGA's. The Dell Poweredge R720 works better and has some of the PCILeech functionality, but not full capability.

Thanks for the help Ulf!

May I ask what operations you have done for the BIOS of the DELL R720 server, I encountered the error in the picture below. It seems that the motherboard PCIe slot does not recognize the DMA board. 微信图片_20230711162356

ufrisk / pcileech-fpga