Open bool101 opened 9 years ago
This can be done easily by setting the destination port to the port of the server in a UDP send. If we just throw it at our firewall we will be able to sniff it off the wire.. The only gotcha is that we will need gatekeeper to send the packets in order or we will end up ass backwards.
I think Zach brings up some good points in his comment https://github.com/samuraictf/gatekeeper/commit/c0b9dacd73a146137de3e21b250d0f2441f37416#commitcomment-11515283 , and I think this is worth discussing, even for my own clarity. I'm confused on the role of Log()
and the logsocket vs capture. How do we want to do capture? Right now it seems like the idea is that we'll dump the contents of ringbuffers to a capture server once an event is triggered (e.g., a matching pcre on something like an exploit throw attempt) so that we can inspect the application-layer traffic. Or maybe just send everything. And how does remote logging fit into this? Is the capture server also getting the messages from Log()
, or do you see it as being separate?
It doesn't matter where the data gets turned into PCAPs, as long as it gets turned into PCAPs and preserves the original source and destination tuples. I realize we don't have access to the raw IP and TCP headers, but we can forge them well enough.
We don't want to invent a brand new data format that's hard to work with and requires new parsing tools. PCAP is bog-standard and there's nice libraries for everything.
My original expectation was that the network-traffic-filtering-and-regex portion of Gatekeeper does not run on the game box. It's going to be too slow. We can just forward all traffic off-box for inspection, and then feed it to the application. The regex does not need to know anything at all about inotify or any of the other defense mechanisms, so this should work fine.
The only thing that needs to be on-box is the inotify bit.
There are a few points going on here, I'll break them out and give my thoughts:
1) Logging server: The logging server is intended to help diagnose problems, show successful defenses, and a sense of how active our services are. It's like the error log file of a web server. When things are breaking and we are losing SLA this is where we want to look. We have no other logs for our services. It's only the things we create. Having things in gatekeeper fail open instead of fail with goto cleanup will help us not to lose SLA here too, but we still want the logs. I know we may collect a lot of data doing this so we may find that we need to dial it back during the game or write some scripts to help sort this data but I think it's useful to know things like how many current active connections there on our box, how many have connected in the past minute, how many of those connections hit an alarm call or triggered another defense. Real time situational awareness is paramount to adapting quickly to active attacks.
2) Capture server / data format / meta-data:
The purpose of the capture server is give us a 15 minute lead on the pcap's we are already getting and to give us coverage on teams redirecting through other services on our box. For example if we are exploited on wdub and then the shellcode running there redirects to atmail on 127.0.0.1 the attacking team can skip the span port on the switch and hide their atmail exploit from our pcaps. The 15 minute lead time on pcaps means 1 or 2 rounds of faster exploit replay. The redirection capture may prove crucial if we're expecting to see more advanced shellcode this year.
As far as formatting, it doesn't matter to me what format we use for this. I was planning to log CONNECT
3) Network filtering off of the game box.
I think this is a bad idea. It adds complexity that makes our services block on other network communications. Other teams have abandoned similar setups (Routards @ DC 21) due to latency and associated SLA hits. However, they may have been doing more than pcre on the back end. I agree that we do need to test the performing impacts for gatekeeper redirection and pcre matching. I don't think we are going to have anything slower than the odroids this year so we can performance test on those. A review of last years pcaps will hopefully tell us how fast we need to be.
Hopefully that helps to clarify some of what I was thinking when putting together the initial design of gatekeeper.
Simple, human readable without additional tools
This won't work for bi-directional communication unless you have two separate logs. It's also impossible to find out exactly which packets arrived at exactly which millisecond.
Here is an example that might be in one file. It will show both sides of the conversation.
CONNECTED: 10.5.5.8 SENT: 0000000: 696d 6170 2073 6572 7665 7220 7634 2072 imap server v4 r 0000010: 6561 6479 2033 3832 3334 6466 6139 3063 eady 38234dfa90c 0000020: 320a 2. RECV: 0000000: 6865 6c6f 2066 6f6f 4062 6172 2e63 6f6d helo foo@bar.com 0000010: 0a ... DISCONNECT
You are correct that we won't be able to know which packet arrived with which timing but the ordering will be correct. We could add a timestamp but do we need to know that information with a 15 minute lead time or can it wait until we get the pcaps? I could see it helping if we were automating things but knowing when the connection started by the timestamp in the capture filename will be enough to match with inotify.
getting complex here is going to burn us IMHO..
we should focus on gatekeeper stopping attacks, and notifying us of an attack by simply telling us a key read happened at a given time.
if we can flag a key read and dump a hexdump like above, that is very easy to implement a replay
This is ~completely solved with my pcap
implementation. See my branch.
As data comes in send it off to our capture server, we will lose context but we will have a realtime view of data coming into our services, log to file for each new connection.