rapid7 / godap

The Data Analysis Pipeline
MIT License
17 stars 10 forks source link

GODAP: The Data Analysis Pipeline

(a port of the ruby-based DAP: https://github.com/rapid7/dap)

Build Status

DAP was created to transform text-based data on the command-line, specializing in transforms that are annoying or difficult to do with existing tools.

DAP reads data using an input plugin, transforms it through a series of filters, and prints it out again using an output plugin. Every record is treated as a document (aka: hash/dict) and filters are used to reduce, expand, and transform these documents as they pass through. Think of DAP as a mashup between sed, awk, grep, csvtool, and jq, with map/reduce capabilities.

DAP was written to process terabyte-sized public scan datasets, such as those provided by https://opendata.rapid7.com/. This go version of dap supports parallel processing of data. Results are forwarded to stdout and consistency of ordering is not guaranteed (and are highly likely to be out of order when compared to the input data stream).

Installation

Install Go version 1.12 or higher.

go get github.com/rapid7/godap

godap supports pcap and geoip, which provide an input and filters, respectively. To enable support for these, you must pass a libpcap or libgeoip tag to your go get command. You must also have those libraries installed on your system (libpcap-dev or libgeoip).

For example:

go get -tags="libpcap libgeoip" github.com/rapid7/godap

Will compile in support for both the pcap input and the geoip filters (geo_ip and geo_ip_org)

Usage

Quick Setup for GeoIP Lookups

Note: The documentation below assumes you've properly setup $GOPATH and $PATH (usually $GOPATH/bin:$PATH) per the official golang documentation.

$ go get github.com/rapid7/godap
$ sudo bash
# mkdir -p /var/lib/geoip && cd /var/lib/geoip && wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCity.dat.gz && gunzip GeoLiteCity.dat.gz && mv GeoLiteCity.dat geoip.dat
$  echo 8.8.8.8 | godap lines + geo_ip line + json
{"line":"8.8.8.8","line.country_code":"US","line.country_code3":"USA","line.country_name":"United States","line.latitude":"38.0","line.longitude":"-97.0"}

Where dap gets fun is doing transforms, like just grabbing the country code:

$  echo 8.8.8.8 | godap lines + geo_ip line + select line.country_code3 + lines
USA

Inputs, filters and outputs

The general syntax when calling godap is godap <input> + (<filter +> <filter +> <filter +> ...) + <output>, where input, filter and output correspond to one of the supported features below. Filters are optional, though an input and output are required. Each feature component is separated by +. Component options are specified immediately after the component declaration. For example, streaming from a wifi adapter and spitting out json documents would resemble: godap pcap iface=en0 rfmon=true + json. Component options with spaces or other complexities can be specified using shell-like quoting. For example, for a bpf pcap filter on the pcap component: godap pcap iface=en0 'filter="tcp port 80"' + json.

Inputs

Outputs