mozilla-services / heka

DEPRECATED: Data collection and processing made easy.
http://hekad.readthedocs.org/
Other
3.39k stars 530 forks source link

SandboxDecoder / nginx_access.lua doesn't work with SSL log #1974

Open wwalker opened 8 years ago

wwalker commented 8 years ago

Using the SandboxDecoder with nginx_access.lua, our logs don't parse. nothing comes out. I can parse the same data with the PayloadRegexDecoder and it works.

Using 0.10.0 from the RPM on my workstation and from the .deb on my server. Both fail the same way. silently don't write anything to the output from nginx, but do write things from heka.{memstats,statmetrics,all-report}

This is my config:

  [hekad]
  maxprocs = 8

  [Dashboard]
  type = "DashboardOutput"
  address = ":4352"
  ticker_interval = 15

  [TestWebserver]
  type = "LogstreamerInput"
  log_directory = "/var/log/nginx"
  file_match = 'access\.log'
  decoder = "CombinedLogDecoder"

  [CombinedLogDecoder]
  type = "SandboxDecoder"
  filename = "lua_decoders/nginx_access.lua"

  [CombinedLogDecoder.config]
  type = "combined"
  user_agent_transform = true
  # combined log format
  #log_format = '$remote_addr - $remote_user [$time_local $time_zone] $server_port $ssl_protocol/$ssl_cipher $upstream_addr "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" -- [$request_time]'
  log_format = '$remote_addr - $remote_user [$time_local] $server_port $ssl_protocol/$ssl_cipher $upstream_addr "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" -- [$request_time]'

  [ESJsonEncoder]
  index = "%{Type}-%{%Y.%m.%d}"
  es_index_from_timestamp = true
  type_name = "%{Type}"
      [ESJsonEncoder.field_mappings]
      Timestamp = "@timestamp"
      Severity = "level"

  [ElasticSearchOutput]
  server = "http://10.20.30.50:9200"
  flush_interval = 5000
  flush_count = 10
  encoder = "ESJsonEncoder"
  message_matcher = "TRUE"

This is some log data: 10.20.30.40 - - [01/Aug/2016:06:31:59 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "POST /api/v2/samples HTTP/1.1" 200 446 "-" "Ruby" -- [0.078] 10.20.30.40 - - [01/Aug/2016:06:31:59 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "POST /api/v2/samples/state HTTP/1.1" 200 4161 "-" "Ruby" -- [0.054] 10.20.30.40 - - [01/Aug/2016:06:32:03 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "GET /api/v2/samples/2130a46fddf4fde50b7ff0d64f6d9706/video.webm?api_key=REdactED HTTP/1.1" 200 256537 "-" "Ruby" -- [8.951] 10.20.30.40 - - [01/Aug/2016:06:32:03 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "GET /api/v2/samples/4a62f6ba29c285a0afc7430922dfac07/video.webm?api_key=REdactED HTTP/1.1" 200 39615 "-" "Ruby" -- [4.038] 10.20.30.40 - - [01/Aug/2016:06:32:03 +0000] 8443 TLSv1/DHE-RSA-AES256-SHA 10.20.30.40:8443 "GET /api/v2/samples/2130a46fddf4fde50b7ff0d64f6d9706/processes.json?api_key=REdactED HTTP/1.1" 200 583852 "-" "Ruby" -- [0.390]

wwalker commented 8 years ago

I've narrowed the problem down to this: $ssl_protocol/$ssl_cipher

that is straight from the nginx.conf as it should be, but build_grammar doesn't work right.
If I change it to $ssl_protocol it works, including the data that should be in the ssl_cipher in $ssl_protocol.

robison commented 8 years ago

$ssl_protocol/$ssl_cipher aren't defined fields per the common_log_format[0] module (which is used here by nginx_access.lua) to decode the field. It simply sees a single field without space or quote delimiting, and interprets it as a single field. Here's an example; you may have some success by space-delimiting the two fields you want:

http://lpeg.trink.com/share/11180597858328222060

[0]: https://github.com/mozilla-services/lua_sandbox_extensions/blob/master/lpeg/modules/lpeg/common_log_format.lua

Also, as an aside, you may have more success asking these sorts of questions on the mailing list or IRC channel, rather than opening an issue. Please lemme know if this doesn't at least put you on the right track. Thanks!

wwalker commented 8 years ago

I asked on both the mailing list and IRC. I only reported it here when I was sure it was a bug. The documentation says to paste the log_format directly from the nginx.conf. I did that and it doesn't work; therefore it is a bug that should either be fixed, or being a documented known bug. I had already worked around it both with PayloadRegexDecoder and with modifying the log_format in SandboxDecoder/nginx_access.lua (losing a bit of data granularity).

trink commented 8 years ago

As @robison pointed out that grammar has not had support added for the ssl module so it falls back to a basic space delimited or quoted string parsing. There are two options: alter your log_format configuration (as suggested above) or extend the grammar to include the ssl module variables.

Note: it is possible to construct a log_format string with ambiguous output (i.e. not parse able) even with 'known' variables.