Use full in-memory technique in powershell

RoganDawes commented 8 years ago

Rather than using a C# stub that gets written to disk, explore using the full in-memory technique used to execute msf stages.

Superficially, this looks like it will be slightly slower overall than the current approach, due to there simply being more to type, but it could be faster to get the window off screen if that function can be invoked immediately, rather than only when the entire C# stub has been added as a type.

As an additional bonus, this leaves less of a forensic footprint, so generally a win.

mame82 commented 7 years ago

When Pi Zero W was released I had the idea to extend my P4wnP1 Project to be an airgap bridge using a HID device. While developing the Code to do this (target talks through HID to P4wnP1, P4wnP1 relays to a socket or filestream, which again could be reached via WiFi or Bluetooth) I stumbled across your great Project (which was a bit of a suprise, seems to be hard to build something new).

At this point I had already implemented a powershell client, which handles HID communication. My initial code heavily relied on inline c sharp / .NET. The advantage was that USB device enumeration of composite devices could be done based on the serial number or device manufacturer. I ported my whole code to PURE IN MEMORY POWERSHELL - no more inline c sharp. While doing this I saw that your method of creating the device file string via WMI was much shorter than mine (using setuapi.dll and hid.dll).

I ultimatly borrowed this idea from your code, although it doesn't allow for queriing the real device serial number of composite devices (small culprit for me as I change USB PID/VID as needed) .

Anyway, I borrowed from you, thus I want to give something back. In contrast to your approach, I'm creating the kernel32.dll imports (for CreateFile) with Reflection in pure powershell, which could be usefull according thiss issue.

My final code isn't released, but here's a test file containing all relevant parts.

Additionally I'm using a very similar HID Report descriptor (64 byte array input, 64 Byte array output). So our descriptors should basically look the same, but they don't. There seems to be something wrong with your "Logical Maximums". So here is mine for reference, maybe this could help to improve something on your side.

Btw. while developing this I found another interesting usecase, beside relaying the HID Stream. I'm using the powershell session on the Windows target, to orchestrate a bash session on P4wnP1. This could for instance be used to inject Powershell Code (like Mimikatz) on demand, via pure HID on a otherwise locked down machine. The remote attacker airgap bridege Comes nearly for free while implementing this.

RoganDawes commented 7 years ago

Thanks for the comments, much appreciated. I will look at your code for reflecting CreateFile, and see how it can be incorporated. At the moment, as expected, it adds quite a lot of "overhead", while still leaving me with the C# methods to move the window and hide the window. I will have to wrap my head around what can be shared in this.

w.r.t your comment . # - both endpoints run an endless loop of sending and receiving. Because the IO operations are slow, there . # is no need to deploy a sleep in these loops - the respective CPUs have enough idle time. . # Sending 64 KByte of raw data takes about 20 seconds (while parsing it, printing it out and echo it back . # on the other endpoint). Thus data transfer rate is about 3,2 KByte/s (synchronous, no further operation on raw data). . # - The transfer rate isn't high, but should be enoug for text (console data)

One thing I have discovered is that Powershell console output is horrifyingly slow. The performance/throughput is not limited by the OS, but rather by Powershell itself. That said, it is possible to achieve a full 64KBytes/second using "cat testfile > /dev/hidg1" on Linux, and a "while/read" loop in powershell. Adding more processing into the powershell script drops the throughput, but I have yet to figure out the real culprit here!

mame82 commented 7 years ago

@RoganDawes

Thx for your feedback.

At the moment, as expected, it adds quite a lot of "overhead", while still leaving me with the C# methods to move the window and hide the window.

I'm planing to use a slightly different approach. Meanwhile I've seen your talks on "Universal Serial Abuse" and you mentioned that base64 adds too much overhead. I've done tests in compressing my script into a GZipStream and encoding this with base64. Reassembling takes only a few lines of code and compression ratio is about 25 percent (of course comments and uneeded linefeeds are stripped from initial script). So when my powershell client code is finished, I'll split away the initializing part into a stage1 payload (creating filestream for HID device and reading in stage2 powershell code). Stage1 will be compressed, base64 encoded to cope with the codesize and printed out by HID Keyboard (or loaded from USB Mass storage, but this increases footprint). Here's example Output of the compressions script on an unoptimized PS1 file:

PS D:\del\powershell> D:\del\powershell\converter2.ps1
Raw script length 19168
Script length with comments stripped 8331
Script length zipped 2879
Zipped script length base64 3840
Script length decompresses 8331

So the 3840 Bytes compressed base64 data contain the Reflection code for CreateFile and the whole unoptimized client code.

The idea, of course, came from the dark side: reference

One thing I have discovered is that Powershell console output is horrifyingly slow. The performance/throughput is not limited by the OS, but rather by Powershell itself. That said, it is possible to achieve a full 64KBytes/second using "cat testfile > /dev/hidg1" on Linux, and a "while/read" loop in powershell. Adding more processing into the powershell script drops the throughput, but I have yet to figure out the real culprit here!

That sounds interesting. Have you done tests with larger input/output reports to increase throughput?

Best regards, MaMe82

RoganDawes commented 7 years ago

So Raw HID appears to be limited to 64 byte reports. You can send more data, but it basically ends up "straddling" multiple reports. As far as I can tell, the maximum "report rate" is 1000 per second - this is controlled by the victim, as they have to poll for a report (this is just how USB works in general). As indicated, though, I have been successful in sending 1000 reports per second (Linux to Linux - 64000bytes/sec), and ~700/s under Windows (~48kbytes/s).

I didn't really experiment with compressed payloads. In the fairly small stage0 payload as currently exists, it didn't really improve things enough to be worthwhile. You are welcome to check with powershell/read_exec_long.ps1, if you like ;-)

mame82 commented 7 years ago

Hi @RoganDawes Thanks for your quick Reply and the usefull hints on transfer rate.

Unfortunately I'm not able to achieve transfer rates > 4kbytes/s. I had no PS console output calls involved during measurement, but there could be several other reasons involved:

P4wnP1 "server" side is implemented in python, which could slow down things
I'm not using a Microcontroller with lufa, but a Pi, maybe the OS Gadget driver stack or UDC Driver implementation slows down (have you done your speed tests on a Pi Zero, too ?)
While measuring speed, a protocol layer for defragemtation / fragmentation of in/out reports was already implemented. This layer isn't running in a decoupled thread (no input / output queue to interface with report reading/writing layer). This of Course slows Things down further, as I have to concatenate byte arrays (memory allocation for the underlying arrays should definitely be slow, garbage collection seems to run async)

To sum up: I have to dig deeper into this. This seems to be badly needed as your transfer rate is about 20 time higher than mine (insane).

Anyway, right now I'm working on the higher level stack and managed to get a nice bash shell piped into powershell. As I my current use case is flipped, compared to yours (using the target host to get a bash on P4wnP1, in order to inject more sophisticated on-demand PowerShell payloads into target), I'm not working on hiding the PowerShell window right now.

I've implemented the GZipEncoding of my (now seperated, pure in memory) stage 1 which looks like this after compression (about 2KB). I think adding in SetWindowPos and ShowWindowAsync via reflection, would consume additional 200 to 300 Bytes in final stage1. My raw stage1 by far isn't as thin as yours, but final script size is nearly the same. No temporary file is written to disc, as no c sharp Code is compiled at runtime.

I'll report back when a "airgap bridge" is implemented, which will of course need some magic to hide the PS window while stage1 gets typed out. I like the approach you've taken to achive this.

RoganDawes commented 7 years ago

To be clear, I am also only getting about 4kbytes/sec using my implementation. I suspect this is because of shoddy powershell code (or my Java-side code, possibly), or because of the nature of the protocol in use, being a "question/answer" style of conversation, rather than full-duplex (i.e. send as much as you have before waiting for a response). As indicated, I was able to get a full 64000 bytes per second using the following approach:

On the Pi Zero:

cat 1MB_file > /dev/hidg1

On the Windows victim (roughly):

$d = New-Object IO.MemoryStream
$count=0
$b = New-Object Byte[](65)

while ($count -lt (1024*1024)) {
  $r = $f.Read($b, 0, 65)
  $d.Write($b,1,64)
  $count = $count + 64
}
([Text.Encoding]::ASCII).GetString($d.ToArray())

Insert timing code appropriately.

mame82 commented 7 years ago

a "question/answer" style of conversation

While reading this a new idea arises. I should test to split up the HID devices into two. One using an input descriptor, the other one using an output descriptor. Desired result: No Q&A like IO operations, blocking read calls are fine. But this would involve reimplementation of large parts of my communication stack. And I'm already aware of an issue in 'USB_f_his.ko' which crashes with kernel panic if data is written to /dev/hidg* before the target is able to read from it (unresponsive interrupt if target hasn't installed USB HID class driver already). Till now I solved this with a blocking read, before data is written the first time, this has to be solved differently. So @RoganDawes, this conversation is like brainstorming and again I want to say thanks for your feedback. Maybe I'm able to bring up a test case for full duplex next week

RoganDawes commented 7 years ago

The easiest way to deal with this is to make sure that the Windows side does the first write, always.

This lets you be sure that the remote end is ready to interact with you.

mame82 commented 7 years ago

Yes I guess this behavior could be kept with two devices (I'm doing this in single HID file proto...Windows starts writing, P4wnP1 starts blocking read). The Mord complex adjustment would be to add in transport control, like you'd done. Not necessarily an output report sent from windows, results in an input report sent from P4wnP1. This should be sort of asynchronous to benefit from splitting up HID devices or there wouldn't be a transfer rate increase as I doubt the underlying IRQ based USB stack is able to handle things faster (haven't looked into underlying EP Konfiguration, but seems to be host based IRQ polling)

RoganDawes commented 7 years ago

If you look at my Powershell second stage, I use async calls (BeginRead, et al) to avoid blocking (and to be able to deal with reading from the sockets/shell as well as reading from the HID device.

The only problem I have with this is spinning while I wait for any of the BeginRead's to actually terminate, which ends up using 100% CPU :-( Unfortunately, I have been unable to find a good method to implement callbacks when the BeginRead completes.

mame82 commented 7 years ago

I''m aware of your asynchronous file io Stäbe. I used a different approach. My stage2 is running 3 threads (in both sides...Python Server and Power shell client). A thread for higher layer console input and a thread for higher layer console output which both have a conditional sleep to reduce CPU usage (no noticeable impact on console responsiveness). They interface with an underlying thread handling HID read/write using two queues to decouple the upper layer threads. The HID communication thread switches between read and write to HID device continueslly, packets are flowing the whole time (empty payload if no transfer data). So the lower thread assures transfer at maximum speed and explicitly uses blocking read as every packet sent gets a packet in answer. This thread doesn't need an idle sleep, as blocking io leaves enough room for CPU to idle.

This is much text, but anyway, it is the old idea of yesterday ;-)

New idea for power shell side

Input Thread reading from HID input report in endless loop, no sleep, blocking read
output thread writing to HID if data available, blocking if no data available

Both threads should leave enough idle time to CPU, due to blocking IO but read/write on demand

Interfacing with upper layer should be doable with a single named pipe.

That's the rough idea, we could discuss it after I had the time to implement something to talk about. Of course I'll share results with you if I got this working

mame82 commented 7 years ago

My stage 2 is here ignore the heading comments, they need to be reworked, but threading and io handling should be self explanatory

RoganDawes commented 7 years ago

I was thinking of something similar, multiple threads doing blocking calls on whatever source (Accept, or Read calls on sockets or devices). My puzzlement is with regards to communicating between the threads, as I do not wish to use a named pipe (anything that requires OS/FS changes, essentially).

I guess I should look at the .Net classes to see if there is a synchronized double ended queue, which could do the trick!

mame82 commented 7 years ago

I use two ways for thread data exchange

runspaces with "thread shared objects" pushed into sessionstate of child threads
a synchronized hash table to communicate dynamically generated objects back to the parent process

See my stage2...I hope the code is readable and not overcomplicated, but thread data exchange is a mess in powershell. The server side is much easier thanks to python, but as said...all of this is the old idea of yesyerday

mame82 commented 7 years ago

Opened new issue, as this discussion is going off topic

sensepost / USaBUSe

Use full in-memory technique in powershell #7