logsearch / logsearch-boshrelease

A BOSH-scalable Elasticsearch+Logstash+Kibana release
http://www.logsearch.io
Apache License 2.0
57 stars 46 forks source link

Reconsider how we do logstash filters #58

Closed dpb587 closed 10 years ago

dpb587 commented 10 years ago

I think our current approach to logstash filter configuration is too fragile. I think logsearch-boshrelease should manage the architecture and define the conventions, but have minimal impact on its usage. While the set of filters in logsearch-filters-common is convenient to have pre-installed, it becomes very inconvenient to end deployments because it significantly affects how users' events are parsed and affects dashboards that we don't already manage and provide an upgrade path for.

For example, we updated json in common, but we also had a json internally with our own timezone configuration and another patch. If we deployed release 15 because we wanted the upgraded dependencies, we would be breaking how all our events are parsed because the common json filter was configured with a higher precedence than our internal one. Even knowing that difference, we're still hard pressed for getting an integration test suite going and being confident in the upgrade.

I propose...

  1. We no longer embed logsearch-filters-common in our log_parser. We can default a published logsearch-filters-common artifact as the default logstash_parser.filters in the spec, but we don't forcefully embed the filters.
  2. We refactor the logstash config-generating script out of log_parser_ctl and into a separate script which accepts arbitrary arguments of filters to use (e.g. *.tar, *.tar.gz, *.zip, file://*) and outputs to STDOUT the usable logstash filters { ... } section.
  3. We add an "integration test" type command to logsearch-filters-common which reuses that refactored script, allowing end users to write integration tests against a known set of filters.
  4. We require users to include the logsearch-filters-common artifact alongside their own artifacts in their deployment manifest, if they want to continue using it.

Which means, most notably, that users can upgrade the architecture and the already user-customizable parser side of things independently and safely.

dpb587 commented 10 years ago

@mrdavidlaing, @wdneto thoughts?

mrdavidlaing commented 10 years ago

logsearch-filters-common currently contains the following:

Transport:

  1. 10-syslog_standard.conf
  2. 20-nxlog_standard.conf Data:
  3. 30-json.conf
  4. 75-elasticsearch_request.conf
  5. 75-nginx_combined.conf

I think we should remove all but the Transport filters from logsearch-filters-common and then externalised as suggested.

dpb587 commented 10 years ago

I find it difficult to draw the line between transport and message format. For example, nxlog could use the syslog protocol to ship syslog messages (i.e. double encoded), but after nxlog, there's not a syslog parser.

What if we changed things up and made this more a repository of easy to use snippets that can be included by other repos. So, get rid of the numbered prefixes here, then in our internal filter configuration, if we want syslog to be at 10, we symlink 10-syslog_standard.conf to ../vendor/logsearch-filters-common/src/syslog_standard.conf.

dpb587 commented 10 years ago

We'd still use the repo to define a default set of filters, but it'll simply be a default directory which prioritizes and symlinks to the files like normal user repos will want to do.

mrdavidlaing commented 10 years ago

What do you think about enabling the selection / ordering of filters to be specified in the deployment manifest?

Eg:

  logstash_parser:
    filters:
    - url: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-common/logsearch-filters-common-1.5-ef354.tgz"
      selected: 
      - {syslog,nxlog}_standard.conf
      - elasticsearch_request.conf
      - nginx_combined.conf
    - url: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-internal/logsearch-filters-internal-1.8-ff324.tgz"
      selected: 
      - *

In this was you can "see" what filters are going to be run, and in what order; and you have quite a bit of flexibility to include and order the filters you are using

We could also default to:

  logstash_parser:
    filters:
    - url: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-common/logsearch-filters-common-1.5-ef354.tgz"
      selected: 
      - *

Which would replicate our current behaviour

dpb587 commented 10 years ago

I'm not a huge fan: this seems quite complex, it doesn't allow for prioritizing across multiple sources, and the configuration can't be reused for integration tests.

I'd rather us (and others) have a directory and/or repo which maintains the full set of filters and their tests. It seems like a better "separation of concerns" approach.

mrdavidlaing commented 10 years ago

I think what you're proposing then is this:

  1. If you do nothing then your filters default to
logstash_parser:
    filters: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-common/logsearch-filters-common-1.5-ef354.tgz"

.2. If you want to customise your filters, then you need to create your own filter repo; publish a TGZ somewhere and change your deploy config to:

logstash_parser:
    filters: "https://server.com/my-custom-filters-0.1-ff65a.tgz"

You should include logsearch-filters-common AND any other public filters you want to use in vendor/, but then select which filters you want from those AND their order via symlinking (at build time), eg:

my-filters/src/10-syslog.conf -> ../vendor/logstash-filters-common/src/10-syslog.conf
my-filters/src/11-my-syslog-ex.conf
my-filters/src/50-cf-loggregator -> ../vendor/logstash-filters-cf/src/75-cf-loggregator.conf

Further, the published TGZ should just contain a single logstash.filters.conf which includes all of your filters cated together.

This makes the log_parser logstash.conf construction phase very simple, basically

cat inputs_outputs.conf filters_pre.conf logstash.filters.conf  filters_post.conf > logstash.conf

It also allows your to write "integration" tests across the combination of filters you will be deploying which can be run outside of an actual logsearch deployment.

If this is what you mean; then I'm for the idea.

mrdavidlaing commented 10 years ago

Thinking about it further, I actually think that in the default case we should still include logsearch-filters-common in the bosh-release rather than pulling if from an external url.

This makes it possible to continue to deploy logseach-boshrelease into an environment without internet access.

dpb587 commented 10 years ago

This is what I mean and I'll create PRs across the three repos involved.

mrdavidlaing commented 10 years ago

:+1:

dpb587 commented 10 years ago

This has been implemented across the repos. Further refactoring/bug fixes should be a separate issue now.