Closed dpb587 closed 10 years ago
@mrdavidlaing, @wdneto thoughts?
logsearch-filters-common
currently contains the following:
Transport:
I think we should remove all but the Transport filters from logsearch-filters-common
and then externalised as suggested.
I find it difficult to draw the line between transport and message format. For example, nxlog could use the syslog protocol to ship syslog messages (i.e. double encoded), but after nxlog, there's not a syslog parser.
What if we changed things up and made this more a repository of easy to use snippets that can be included by other repos. So, get rid of the numbered prefixes here, then in our internal filter configuration, if we want syslog to be at 10, we symlink 10-syslog_standard.conf
to ../vendor/logsearch-filters-common/src/syslog_standard.conf
.
We'd still use the repo to define a default set of filters, but it'll simply be a default
directory which prioritizes and symlinks to the files like normal user repos will want to do.
What do you think about enabling the selection / ordering of filters to be specified in the deployment manifest?
Eg:
logstash_parser:
filters:
- url: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-common/logsearch-filters-common-1.5-ef354.tgz"
selected:
- {syslog,nxlog}_standard.conf
- elasticsearch_request.conf
- nginx_combined.conf
- url: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-internal/logsearch-filters-internal-1.8-ff324.tgz"
selected:
- *
In this was you can "see" what filters are going to be run, and in what order; and you have quite a bit of flexibility to include and order the filters you are using
We could also default to:
logstash_parser:
filters:
- url: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-common/logsearch-filters-common-1.5-ef354.tgz"
selected:
- *
Which would replicate our current behaviour
I'm not a huge fan: this seems quite complex, it doesn't allow for prioritizing across multiple sources, and the configuration can't be reused for integration tests.
I'd rather us (and others) have a directory and/or repo which maintains the full set of filters and their tests. It seems like a better "separation of concerns" approach.
I think what you're proposing then is this:
logstash_parser:
filters: "https://ci-logsearch.s3.amazonaws.com/logsearch-filters-common/logsearch-filters-common-1.5-ef354.tgz"
.2. If you want to customise your filters, then you need to create your own filter repo; publish a TGZ somewhere and change your deploy config to:
logstash_parser:
filters: "https://server.com/my-custom-filters-0.1-ff65a.tgz"
You should include logsearch-filters-common
AND any other public filters you want to use in vendor/
, but then select which filters you want from those AND their order via symlinking (at build time), eg:
my-filters/src/10-syslog.conf -> ../vendor/logstash-filters-common/src/10-syslog.conf
my-filters/src/11-my-syslog-ex.conf
my-filters/src/50-cf-loggregator -> ../vendor/logstash-filters-cf/src/75-cf-loggregator.conf
Further, the published TGZ should just contain a single logstash.filters.conf
which includes all of your filters cat
ed together.
This makes the log_parser logstash.conf construction phase very simple, basically
cat inputs_outputs.conf filters_pre.conf logstash.filters.conf filters_post.conf > logstash.conf
It also allows your to write "integration" tests across the combination of filters you will be deploying which can be run outside of an actual logsearch deployment.
If this is what you mean; then I'm for the idea.
Thinking about it further, I actually think that in the default case we should still include logsearch-filters-common
in the bosh-release rather than pulling if from an external url.
This makes it possible to continue to deploy logseach-boshrelease into an environment without internet access.
This is what I mean and I'll create PRs across the three repos involved.
:+1:
This has been implemented across the repos. Further refactoring/bug fixes should be a separate issue now.
I think our current approach to logstash filter configuration is too fragile. I think
logsearch-boshrelease
should manage the architecture and define the conventions, but have minimal impact on its usage. While the set of filters inlogsearch-filters-common
is convenient to have pre-installed, it becomes very inconvenient to end deployments because it significantly affects how users' events are parsed and affects dashboards that we don't already manage and provide an upgrade path for.For example, we updated
json
in common, but we also had ajson
internally with our own timezone configuration and another patch. If we deployed release 15 because we wanted the upgraded dependencies, we would be breaking how all our events are parsed because the commonjson
filter was configured with a higher precedence than our internal one. Even knowing that difference, we're still hard pressed for getting an integration test suite going and being confident in the upgrade.I propose...
logsearch-filters-common
in ourlog_parser
. We can default a publishedlogsearch-filters-common
artifact as the defaultlogstash_parser.filters
in thespec
, but we don't forcefully embed the filters.log_parser_ctl
and into a separate script which accepts arbitrary arguments of filters to use (e.g.*.tar
,*.tar.gz
,*.zip
,file://*
) and outputs toSTDOUT
the usable logstashfilters { ... }
section.logsearch-filters-common
which reuses that refactored script, allowing end users to write integration tests against a known set of filters.logsearch-filters-common
artifact alongside their own artifacts in their deployment manifest, if they want to continue using it.Which means, most notably, that users can upgrade the architecture and the already user-customizable parser side of things independently and safely.