ufrisk / pcileech

Direct Memory Access (DMA) Attack Software
GNU Affero General Public License v3.0
4.87k stars 718 forks source link

How to contribute? #33

Closed bitflip2 closed 6 years ago

bitflip2 commented 6 years ago

Hello, I just noticed your tweet regarding a possible feature to map per-process memory to files mounted via DMA and was wondering how to contribute to this repository. Maybe a roadmap or more documentation about todos would be helpful for possible contributors (like me :smiley:)

ufrisk commented 6 years ago

Contributions are really welcome. If you have an idea about what you wish to contribute and wish to check with with me before if it's a good idea please send me a Twitter DM or email.

After that Fork+Pull Request is probably the way to go.

More documentation is needed as you mention. Probably start up wiki pages containing various howtos would be a good idea. Up until now I've been a bit busy to do that though.

About a roadmap, it's not likely to happen. This is all very event-driven depending on what the operating system vendors come up with. Also if I come to think of some features I've usually just added them.

ikelos commented 6 years ago

Firstly, sorry for potentially hijacking this thread, but it seems like a good place to discuss contributing rather than spawning lots of "me too" issues.

I'm keen to see the FPGA version working with a linux controlling system. My ability to actually write C code is unfortunately pretty poor, but I've got access to a PCIescreamer and a Gentoo host, so testing different things out should be pretty easy, if that helps?

I'm also very keen on the potential for python bindings, which again I can help test (and once they're in place I can quickly get support for it into volatility directly). 5:) Please let me know if any of that would be useful for the project?

ufrisk commented 6 years ago

You're welcome :) I agree this is a good place to discuss rather than spawn new threads.

The problem with Linux is the FTDI USB driver or libusb or the combination. I have it up and running in PCILeech already, but the performance is so abysmal that I've not considered releasing it.

I've received a separate kernel level driver for this from another person helping, which is performing really nicely, but it's not finished and is still super buggy. I hope he'll be able to complete it and we'll have Linux support.


About python integration, and possibly with volatility, this can be achieved in a couple of ways I guess: 1) duplicate all code from scratch in python or code a completely new tool, (a lot of work, and I won't be the one doing it). 2) restructure code into .exe and .dll/.so library. The .dll/.so may be used by python. 3) expose files in the file system that python (and other tools) will be able to consume. This is already in place with the mount command on Windows. Target system live RAM is exposed as file. I plan to add a /proc/ style filesystem with process virtual memory in the not too distant future.

(2) and (3) requires some c knowledge. But you can start using the live RAM file right now. It's probably a bit slow since it's optimized for the USB3380 at the moment. I'll do some updates to this in the coming versions.


Another very pressing need is to stabilize the PCIeScreamer support. Preferably by stabilizing the current PCILeech bitstream. Alternatively by implementing support for the PCIeScreamer native bitstream with its udp2tlp network bridge. Looking into the PCIeScreamer bitstream requires some knowledge about FPGAs and Verilog.

ikelos commented 6 years ago

Thanks for getting back to me so quickly!

Hmmmm, interesting. Is the code that uses libftd3xx available in a branch somewhere? I don't know how I'd go about measuring the performance of it, but I'm interested to know why it's so poor so playing around with it might be a good start?

I'm surprised a kernel level driver's needed to talk to a USB device efficiently? I wonder what that'd be doing that's different to normal USB interactions? Presumably the userland/kernel-land context switch can't be chewing up that much time? I'm looking forward to the linux support landing, as I say, if you need a hand testing it I'd be happy to help. 5:)


Ah, exposing it as a file does seem like a free way to get volatility/external tool support. Quite an elegant solution! 5:) I wrote the firewire support currently in volatility 2, and just having a flat file representation would make life a lot easier (although I'd be interested to know how linux would respond if parts of the stream couldn't be read when it's represented as a file)? I think restructuring the code into a library would provide great flexibility (even just a minimal read/write bytes/pages API) and allow people to write additional tools on top of it, but since the source is there I guess that's open for anyone to do already. 5:)


FPGA programming is even further outside my skill-set than C coding! 5:P I think I'll just leave that one to the experts! 5;)

ufrisk commented 6 years ago

Unfortunately I don't have that code in a branch, it's just floating around at home. But the API calls are exactly the same as on Windows so the alterations are really minimal. Performance is very bad though <1MB/s.

About the kernel driver; I just think thats the way he went about doing it. If it works, which I have no doubt it will if he finishes it, it will be totally awesome for my part. Problem solved 👍

The file parts are unfortunately only working on Windows right now. I'll look into this on Linux as well when/if we get the Linux USB connectivity for the FPGAs working, or if someone is willing to code some c... On Windows PCILeech exposes a file with the size of the RAM. When a program, such as volatility reads it PCILeech will request the data on the fly from the target system. If it's unable to read the data it will serve null bytes to the program requesting the data. You can also write to this file. An example of me running volatility against a Mac victim is found here: https://youtu.be/WR7hDKbGiX8?t=10m40s

The .so exposing some functionality is definitely something I'll look into as well, but I'll probably do that after I've completed what I'm working on right now.

ikelos commented 6 years ago

Hmmmm, so I've had a stab at changing the calls around (with some help from some C-knowledgable friends). I decided to make the calls directly using FT_method and linking the library in, rather than dynamically loading the methods from the library. It all compiles and seems to work (although BOOL and WCHAR are both defined by ftd3xx.h and also oscompatibility.h, so I had to do some type reassignment which might have messed somethings up, and I also had to define HMODULE as something, but since I no longer dynamically call the library methods I assume it won't matter).

Copying the device list function from the screamer test program worked and returned one device. The code gets up to DeviceFPGA_GetDeviceID_FpgaVersion, seemingly writes the correct data (using FT_WritePipe) since status is FT_OK, but then fails on the following FT_ReadPipe, with an FT_TIMEOUT error, which doesn't help me really debug it further. I now don't know if it's an FTDI library issue, or something with the code that's flashed onto it (which, given it works on windows, I suspect it's the linux build I'm trying).

I also have a question that I don't know if it's stupid to ask or not, but figure I'll ask anyway, just in case. The code seems to connect in FT245 mode, rather than FT600 mode, but I'd thought the PCIescreamer devices could handle the FT600 mode. Is that simply for compatibility with other FPGAs that might end up getting used, or does FT600 mode not give you anything extra or is there some other reason for doing it?

Also, if this isn't the best location to have such discussions, is there a better place I can go to ask questions as I try and tinker with this to get it going?

ufrisk commented 6 years ago

Awesome that you got started!

i really do have to create some wiki help pages documenting things like that communication protocol between PCILeech and FPGA...

I don't mind if the code is messy if you get it up working at a good performance. That part is always easy to clean up afterwards :) Please keep in mind that I got it to work earlier on, just with horrible performance so I cannot give any promises this will work for you. I won't merge the code if it gives 1-2 MB/s transfer speed only :( If you get to around half (or on-par) with what I'm getting in Windows it would be totally awesome :)

About questions and such, just catch me on Twitter, or do you prefer that I create a private repo here on Github? That might be easier?


Yes, FT245 is used, same protocol for all my three FPGA bitstreams. The Lambdaconcent bitstream also use FT245. For the use scenarios we have FT600 won't offer any benefits over FT245 from what I can see. Also it will complicate things a lot on the FPGA side...


The protocol when transmitting TO the FPGA is 64-bit based. First 32-bit (DWORD) of actual data and then 32-bit (DWORD) control. as follows:

DD DD DD DD NN ZZ ZT MM where: DD DD DD DD = 32-bit of data MM = Magic = 0x77 (always) T = TYPE of DATA:

ZZZ = not used (zeroes) when doing TLP, CMD, LOOP access, additional data when doing CFG access. NN = command id when transmitting a CMD type data, additional data when doing CFG access.

Transmitting sample loopback data of 33 33 33 33 is: 33 33 33 33 00 00 02 77


Receiving from FPGA is either DUMMY "keepalive" DWORDs of 0x55556666 or payload data in 256-bit chunks divided into 1 DWORD (32-bit control) and 7 DWORDs of data.

S D D D D D D S = STATUS DWORD D = DATA DWORD

where S = S1S2 S3S4 S5S6 S7e Sx = 4-bit status corrensponding to data DWORD,


I don't really think you need to dig into the actual protocol though. Just play around with receiving some data (FTDI should always send at least 5 DWORDS of dummy data (0x55556666 in each read no matter what). Otherwise try with transmitting loopback data and receive it back.

If you really need I guess I could add a CMD ID to access the LEDs on the PCIeScreamer board if you need hardware visual feedback, but that would require me to recompile the bitstream. Please let me know if you need this.

Also let me know if I should create a new private repo, or if you prefer to do it over Twitter or just fork it yourself?

ufrisk commented 6 years ago

I made an attempt at the protocol specification (since more ppl have been asking about it) at: https://github.com/ufrisk/pcileech-fpga/wiki/Protocol-Specification-(dev)

Do don't think it will be needed though.

ikelos commented 6 years ago

Thanks, that was handy and let me use the loopback messages to test a bit more what was going on. I found out that the loopback worked, but trying any of the CFG/CMD commands failed with a timeout. I've since worked past this problem, and forked the code, so we can discuss it more there:

https://github.com/ikelos/pcileech/issues/1

Nicholas-Johnson-opensource commented 5 years ago

I am interested in this project but am not sure if my skills are good enough yet.

It looks like the firmware of the Intel X550-T2 suggests it that the X550 NIC is based on i386 architecture. If its firmware could be modified then we have a dirt cheap device with 64-bit BARs, possibly capable of scraping 20Gb/s DMA over TCP/IP. Intel X710-T4 would be double that. What does it take to port a device into a PCILeech? What criteria is there to determine if a piece of hardware is feasible for use?

Cheers!

ufrisk commented 5 years ago

I am interested in this project but am not sure if my skills are good enough yet.

It looks like the firmware of the Intel X550-T2 suggests it that the X550 NIC is based on i386 architecture. If its firmware could be modified then we have a dirt cheap device with 64-bit BARs, possibly capable of scraping 20Gb/s DMA over TCP/IP. Intel X710-T4 would be double that. What does it take to port a device into a PCILeech? What criteria is there to determine if a piece of hardware is feasible for use?

Cheers!

If it's possible it would be so sweet.

I'm currently doing a major refactoring of PCILeech which I hope to release next week together with a bunch of new features.

One major thing is that I will move all memory acquisition into the LeechCore library which abstracts memory acquisition away from analysis and exploitation.

The way to go would be to create a LeechCore device similar to the existing implementations found in device_*.c It's basically a matter of implementing the functions found in the LEECHCORE_CONTEXT struct

I would gladly accept a pull request for such an awesome update. If you decide to give it a shot please let me know so I can purchase one myself :)

Also please let me know if you have any questions with regards to the implementation. Also I'd be super interested to know if you're able to pull off a PoC so we know if this really is possible :)

Nicholas-Johnson-opensource commented 5 years ago

If it's possible it would be so sweet.

You did not answer how you choose devices. Why is the USB3380 was suitable and all of the others were not, for example? I don't know that the X550-T2 is actually suitable - it is just an uneducated idea that passed through my mind after I had been thinking about PCILeech and then noted how there's finally hope for PCI because I own a couple of devices with solely 64-bit BARs. So it might be a pipe dream until I know how PCILeech developers discard unfeasible devices as candidates.

Also, I should check that the presence of 64-bit BARs means the device is capable of full 64-bit DMA. Is this true? Or does it also depend on the mask? Actually, probably does. I have a JHL7540 here and its USB controller has only a 32-bit BAR, but dma_mask_bits from PCI sysfs is

  1. So how does that work?

If it helps, the Intel X550-T2 has 64-bit prefetchable BARs and dma_mask_bits=64. So I am guessing the X550-T2 is fine in this regard. But for other devices with contradictory BARs / dma_mask_bits, which one overrides the other?

I'm currently doing a major refactoring of PCILeech which I hope to release next week together with a bunch of new features.

One major thing is that I will move all memory acquisition into the LeechCore library which abstracts memory acquisition away from analysis and exploitation.

The way to go would be to create a LeechCore device similar to the existing implementations found in device_*.c It's basically a matter of implementing the functions found in the LEECHCORE_CONTEXT struct

I would gladly accept a pull request for such an awesome update. If you decide to give it a shot please let me know so I can purchase one myself :)

Again, possibly outside of my skill level, but if you can connect me with somebody who knows a lot about firmware then maybe. To do the USB3380, did you get your hands on datasheets? I would not go out and buy one unless I actually demonstrate promising results with mine because I do not think chances are high with my current skill level. Unless you genuinely think that it is feasible and you are willing to attempt to make it work. You can download the firmware update for it from the Intel website. Run binwalk on an image and try hexdump in raw mode and see if you agree that the instructions are i386.

A Thunderbolt 3 add-in card would also be a fantastic target, but reverse engineering the firmware will be very tricky. If it also runs i386 bytecode, it might be even better, because it is capable of routing 32Gb/s of PCI packets in a programmable manner and even DisplayPort. So it should have enough performance to keep up.

You should have a look at drivers/net/thunderbolt.c and see if it makes more sense to you than it does to me. Supposedly Thunderbolt Networking does not pose a DMA threat so does not need user authorisation. But it is using DMA rings so I wonder if modifying the thunderbolt_net.ko driver on the target machine could be used. This would only work for Linux targets but perhaps if you use a standard PCILeech to load a dodgy driver on a system with Thunderbolt, you could then swap the Thunderbolt to ExpressCard enclosure to a plain Thunderbolt cable to the attacking system (also with Thunderbolt) and attack using the Thunderbolt Networking feature at much higher speeds.

WAIT. It just occurred to me that none of this firmware hacking is needed whatsoever. Well at least for a Linux target. If you modify any PCIe NIC driver, then you can dump straight through the NIC. So you just need PCILeech to replace the driver of the host's NIC then you should be capable of 1Gb/s on a common Gigabit card, regardless of its bit mask (driver fetching memory for us). I guess the reason why we want firmware-based solutions is to be OS agnostic.

Also please let me know if you have any questions with regards to the implementation. Also I'd be super interested to know if you're able to pull off a PoC so we know if this really is possible :)

How come the kernel module for flashing the USB3380 only has like 100 bytes to flash? Does that mean it only overwrites parts of the firmware? And how does a PCILeech implementation happen in so few bytes?

To wrap up, if I am going to get this to work, then you will need to connect me with experienced individuals in the firmware / reverse engineering area. I am certain you know more than I because I only just graduated my MPE in the end of 2018.

Is there a better way to contact you and / or the main contributors than Twitter? I tried to find an email but failed.

Cheers!

ufrisk commented 5 years ago

Hi,

Having 64-bit BARs doesn't necessarily mean that it's able to do 64-bit DMA. The original DMA attack hardware I used, the USB3380, do have the ability for 64-bit BARs - but is only able to do 32-bit DMA due to hardware limitations as an example. Also the USB3380 was selected because Joe Fitz had already demonstrated it was capable of DMA so I knew it was usable from the start.

About modifying drivers, if you're able to modify drivers you already have control of the target PC and is already able to dump memory for it. It's all cool if you wish to roll your own kernel driver for reading memory over the network, and even expose it to PCILeech - but nothing of that have to do with DMA. Still it could be useful.

Modification of the firmware, if possible, would be much more useful and versatile.

Unfortunately I'm way too busy myself to be able to look into this, or at least before you have a PoC for it. But if you manage to pull it off it would be super awesome, and I would happily accept such a patch to the LeechCore project. It's just that I won't be able to help very much. Things like firmware reversing just takes more time than I do have myself right now.

You may reach me at: pcileech@frizk.net