rapid7 / godap

The Data Analysis Pipeline
MIT License
17 stars 10 forks source link

EXPERIMENTAL: Badger support #1

Closed dabdine-r7 closed 5 years ago

dabdine-r7 commented 5 years ago

What does this do?

Add support for the badger key/value database by dgraph-io.

Features added

  1. Support for a new badger input and output. This allows reading from, and writing to, badger.
  2. Support for a new processor: join. This filter allows clients to join one or more source fields of a document (separated by a comma) into a destination field dest using some supplied (default to comma ,) separator sep.
  3. Use the new jsoniter golang json package for performance. This library touts much faster json read performance than the native encoding/json library that ships with go. Additionally, it has support for one-time configuration of a JSON parser that uses UseNumber. With the golang standard JSON library, this option is only exposed / configurable with a Decoder, which must be instantiated per read source. Using jsoniter allows us to reduce memory and CPU usage per line of read data.

TODO

  1. Update README.md with documentation on the new processors
  2. Add unit tests, where possible.
  3. Disable badger logging (or at the least, make a quiet mode)

Example usage

# stream write json data from scans.io into a badger database
$ curl -s -L https://scans.io/data/silas/mi/http/201411_80_http.json.gz | pigz -dc | head -n2000 |  ./godap json + transform data=base64decode + join source=host,port dest=value + not_empty data + badger dir=./badger-godap-test key_field=value value_field=data 
badger 2019/04/05 10:37:23 INFO: All 0 tables opened in 0s
running gc..
badger 2019/04/05 10:37:24 DEBUG: Storing value log head: {Fid:0 Len:42 Offset:5747890}
badger 2019/04/05 10:37:24 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2019/04/05 10:37:24 INFO: Running for level: 0
badger 2019/04/05 10:37:24 DEBUG: LOG Compact. Added 1823 keys. Skipped 0 keys. Iteration took: 884.552µs
badger 2019/04/05 10:37:24 DEBUG: Discard stats: map[]
badger 2019/04/05 10:37:24 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 2.612492ms
badger 2019/04/05 10:37:24 INFO: Compaction for level: 0 DONE
badger 2019/04/05 10:37:24 INFO: Force compaction on level 0 done

# read from the database, using a prefix search
$ ./godap badger dir=./badger-godap-test prefix=99. + json
badger 2019/04/05 10:37:29 INFO: All 1 tables opened in 1ms
badger 2019/04/05 10:37:29 DEBUG: Value Log Discard stats: map[]
badger 2019/04/05 10:37:29 INFO: Replaying file id: 0 at offset: 5747932
badger 2019/04/05 10:37:29 INFO: Replay took: 44.397µs
{"key":"99.186.47.158","value":"HTTP/1.1 401 Unauthorized\r\nWWW-Authenticate: Basic realm=\"Netopia-3000\"\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nServer: Allegro-Software-RomPager/4.03\r\nConnection: close\r\n\r\n19e\r\n\u003c!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"\u003e\u003chtml\u003e\n\u003chead\u003e\n\u003ctitle\u003eLogin Failure\u003c/title\u003e\n\u003cMETA http-equiv=Content-Type content=\"text/html; charset=windows-1252\"\u003e\u003c/head\u003e\n\u003cbody\u003ch1\u003eLogin Failure\u003c/h1\u003e\nYou have entered an invalid username/password. The router is protected.  You need Admin privileges to access this page.\u0026nbsp;Please try again.\u003cp\u003e\nReturn to \u003cA HREF=\"\"\u003elast page\u003c/A\u003e\u003cp\u003e\n\n\u003c/body\u003e\n\u003c/html\u003e\n\r\n0\r\n\r\n"}
{"key":"99.37.143.49","value":"HTTP/1.1 302 \r\nServer: \r\nDate: Wed, 05 Nov 2014 15:10:38 GMT\r\nCache-Control: no-cache,no-store,must-revalidate,post-check=0,pre-check=0\r\nLocation: https://99.37.143.49:4343/\r\nContent-Type: text/html; charset=utf-8\r\nContent-Type: text/html; charset=utf-8\r\nConnection: close\r\n\r\n\u003cHTML\u003e\n\u003cHEAD\u003e\u003cTITLE\u003e302 \u003c/TITLE\u003e\u003c/HEAD\u003e\n\u003cBODY BGCOLOR=\"#cc9999\" TEXT=\"#000000\" LINK=\"#2020ff\" VLINK=\"#4040cc\"\u003e\n\u003cH4\u003e302 \u003c/H4\u003e\n\n\u003cADDRESS\u003e\u003cA HREF=\"http://www.arubanetworks.com\"\u003e\u003c/A\u003e\u003c/ADDRESS\u003e\n\u003c/BODY\u003e\n\u003c/HTML\u003e\n"}
{"key":"99.198.121.253","value":"HTTP/1.1 200 OK\r\nDate: Wed, 05 Nov 2014 15:10:38 GMT\r\nServer: Apache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/1.0.0-fips mod_bwlimited/1.4\r\nVary: Accept-Encoding,User-Agent\r\nContent-Length: 328\r\nContent-Type: text/html;charset=ISO-8859-1\r\n\r\n\u003c!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\"\u003e\n\u003chtml\u003e\n \u003chead\u003e\n  \u003ctitle\u003eIndex of /\u003c/title\u003e\n \u003c/head\u003e\n \u003cbody\u003e\n\u003ch1\u003eIndex of /\u003c/h1\u003e\n\u003cul\u003e\u003cli\u003e\u003ca href=\"cgi-bin/\"\u003e cgi-bin/\u003c/a\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\u003caddress\u003eApache/2.2.22 (Unix) mod_ssl/2.2.22 OpenSSL/1.0.0-fips mod_bwlimited/1.4 Server at 99.198.121.253 Port 80\u003c/address\u003e\n\u003c/body\u003e\u003c/html\u003e\n"}
{"key":"99.25.178.102","value":"HTTP/1.1 401 Unauthorized\r\nWWW-Authenticate: Basic realm=\"Netopia-3000\"\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nServer: Allegro-Software-RomPager/4.03\r\nConnection: close\r\n\r\n19e\r\n\u003c!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\"\u003e\u003chtml\u003e\n\u003chead\u003e\n\u003ctitle\u003eLogin Failure\u003c/title\u003e\n\u003cMETA http-equiv=Content-Type content=\"text/html; charset=windows-1252\"\u003e\u003c/head\u003e\n\u003cbody\u003ch1\u003eLogin Failure\u003c/h1\u003e\nYou have entered an invalid username/password. The router is protected.  You need Admin privileges to access this page.\u0026nbsp;Please try again.\u003cp\u003e\nReturn to \u003cA HREF=\"\"\u003elast page\u003c/A\u003e\u003cp\u003e\n\n\u003c/body\u003e\n\u003c/html\u003e\n\r\n0\r\n\r\n"}
{"key":"99.4.90.46","value":"HTTP/1.1 401 Unauthorized\r\nWWW-Authenticate: Basic realm=\"Netopia-3000\"\r\nContent-Type: text/html\r\nTransfer-Encoding: chunked\r\nServer: Allegro-Software-RomPager/4.03\r\nConnection: close\r\n\r\n0a8\r\n\u003chtml\u003e\n\u003chead\u003e\n\u003ctitle\u003eLogin Failed\u003c/title\u003e\n\u003c/head\u003e\n\u003cbody\u003e\n\u003ccenter\u003e\u003ch1\u003eAccess Denied\u003c/h1\u003e\n\u003ch2\u003eLogin error: Invalid User name/Password\u003c/h2\u003e\n\u003c/center\u003e\n\u003cp\u003e\n\n\u003c/body\u003e\n\u003c/html\u003e\n\r\n0\r\n\r\n"}
badger 2019/04/05 10:37:29 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}

# read all records in the database (only showing count to truncate the length)
$ ./godap badger dir=./badger-godap-test + json | wc -l
badger 2019/04/05 10:38:47 INFO: All 1 tables opened in 1ms
badger 2019/04/05 10:38:47 DEBUG: Value Log Discard stats: map[]
badger 2019/04/05 10:38:47 INFO: Replaying file id: 0 at offset: 5747932
badger 2019/04/05 10:38:47 INFO: Replay took: 42.756µs
badger 2019/04/05 10:38:47 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
    1821
dabdine-r7 commented 5 years ago

I'm retiring this PR for now. We can reopen if we want it later. However, we weren't seeing what we needed to with badger to warrant supporting this feature.

I've ported all the other features (remove/join filters, and signal handling) EXCEPT for the use of jsoniter into master