capture server implementation

bool101 commented 9 years ago

As data comes in send it off to our capture server, we will lose context but we will have a realtime view of data coming into our services, log to file for each new connection.

r3v-evilmegacorp commented 9 years ago

This can be done easily by setting the destination port to the port of the server in a UDP send. If we just throw it at our firewall we will be able to sniff it off the wire.. The only gotcha is that we will need gatekeeper to send the packets in order or we will end up ass backwards.

zxkevn commented 9 years ago

I think Zach brings up some good points in his comment https://github.com/samuraictf/gatekeeper/commit/c0b9dacd73a146137de3e21b250d0f2441f37416#commitcomment-11515283 , and I think this is worth discussing, even for my own clarity. I'm confused on the role of Log() and the logsocket vs capture. How do we want to do capture? Right now it seems like the idea is that we'll dump the contents of ringbuffers to a capture server once an event is triggered (e.g., a matching pcre on something like an exploit throw attempt) so that we can inspect the application-layer traffic. Or maybe just send everything. And how does remote logging fit into this? Is the capture server also getting the messages from Log(), or do you see it as being separate?

zachriggle commented 9 years ago

It doesn't matter where the data gets turned into PCAPs, as long as it gets turned into PCAPs and preserves the original source and destination tuples. I realize we don't have access to the raw IP and TCP headers, but we can forge them well enough.

We don't want to invent a brand new data format that's hard to work with and requires new parsing tools. PCAP is bog-standard and there's nice libraries for everything.

My original expectation was that the network-traffic-filtering-and-regex portion of Gatekeeper does not run on the game box. It's going to be too slow. We can just forward all traffic off-box for inspection, and then feed it to the application. The regex does not need to know anything at all about inotify or any of the other defense mechanisms, so this should work fine.

The only thing that needs to be on-box is the inotify bit.

bool101 commented 9 years ago

There are a few points going on here, I'll break them out and give my thoughts:

1) Logging server: The logging server is intended to help diagnose problems, show successful defenses, and a sense of how active our services are. It's like the error log file of a web server. When things are breaking and we are losing SLA this is where we want to look. We have no other logs for our services. It's only the things we create. Having things in gatekeeper fail open instead of fail with goto cleanup will help us not to lose SLA here too, but we still want the logs. I know we may collect a lot of data doing this so we may find that we need to dial it back during the game or write some scripts to help sort this data but I think it's useful to know things like how many current active connections there on our box, how many have connected in the past minute, how many of those connections hit an alarm call or triggered another defense. Real time situational awareness is paramount to adapting quickly to active attacks.

2) Capture server / data format / meta-data:

The purpose of the capture server is give us a 15 minute lead on the pcap's we are already getting and to give us coverage on teams redirecting through other services on our box. For example if we are exploited on wdub and then the shellcode running there redirects to atmail on 127.0.0.1 the attacking team can skip the span port on the switch and hide their atmail exploit from our pcaps. The 15 minute lead time on pcaps means 1 or 2 rounds of faster exploit replay. The redirection capture may prove crucial if we're expecting to see more advanced shellcode this year.

As far as formatting, it doesn't matter to me what format we use for this. I was planning to log CONNECT , RECV: SENT: , DISCONNECT and nothing more. Simple, human readable without additional tools, and easily grepable. Drop the logs to a file that had a datetime stamp for when the connection was first received with a unique capture server per service. Most of what we're doing with pcaps is following tcp streams to reassemble data to pull out an exploit once it triggeres on an inotify hit. Well, gatekeeper already has the data reassembled without the headers. I didn't see the need to add faked information about the IP and TCP headers only so that we could remove it again. Zach said something about a pcap raw format that might not require headers? That could be an option. Again, it really doesn't matter which format we use so long as we can view and search the log files with each side of the conversation broken down. It would be a very cool exercise to take these files and automatically generate a replayable exploit too.

3) Network filtering off of the game box.

I think this is a bad idea. It adds complexity that makes our services block on other network communications. Other teams have abandoned similar setups (Routards @ DC 21) due to latency and associated SLA hits. However, they may have been doing more than pcre on the back end. I agree that we do need to test the performing impacts for gatekeeper redirection and pcre matching. I don't think we are going to have anything slower than the odroids this year so we can performance test on those. A review of last years pcaps will hopefully tell us how fast we need to be.

Hopefully that helps to clarify some of what I was thinking when putting together the initial design of gatekeeper.

zachriggle commented 9 years ago

Simple, human readable without additional tools

This won't work for bi-directional communication unless you have two separate logs. It's also impossible to find out exactly which packets arrived at exactly which millisecond.

bool101 commented 9 years ago

Here is an example that might be in one file. It will show both sides of the conversation.

CONNECTED: 10.5.5.8 SENT: 0000000: 696d 6170 2073 6572 7665 7220 7634 2072 imap server v4 r 0000010: 6561 6479 2033 3832 3334 6466 6139 3063 eady 38234dfa90c 0000020: 320a 2. RECV: 0000000: 6865 6c6f 2066 6f6f 4062 6172 2e63 6f6d helo foo@bar.com 0000010: 0a ... DISCONNECT

You are correct that we won't be able to know which packet arrived with which timing but the ordering will be correct. We could add a timestamp but do we need to know that information with a 15 minute lead time or can it wait until we get the pcaps? I could see it helping if we were automating things but knowing when the connection started by the timestamp in the capture filename will be enough to match with inotify.

r3v-evilmegacorp commented 9 years ago

getting complex here is going to burn us IMHO..

we should focus on gatekeeper stopping attacks, and notifying us of an attack by simply telling us a key read happened at a given time.

if we can flag a key read and dump a hexdump like above, that is very easy to implement a replay

zachriggle commented 9 years ago

This is ~completely solved with my pcap implementation. See my branch.

samuraictf / gatekeeper

capture server implementation #12