logsearch / logsearch-boshrelease

A BOSH-scalable Elasticsearch+Logstash+Kibana release
http://www.logsearch.io
Apache License 2.0
57 stars 46 forks source link

Added support for excluding some of the default logstash filters #206

Closed stoyanr closed 4 years ago

stoyanr commented 8 years ago

We would like to be able to exclude some of the default logstash filters from the final logstash.conf, in order to be able to customize the parser rules completely. In particular, we had to modify logstash-filters-default heavily in order to meet the requirements of our stakeholders. We are also not sure yet if we need the 2 newly introduced default filters, if_it_looks_like_json, and timecop. So far we didn't have any issues with filters_pre and filters_post, but for the sake of completeness, I included them in the exclusion mechanism as well.

Having such a mechanism would also allow us to stay independent of future changes to the default filters introduced in this release when adopting newer versions of it, up until the point in time we actually need them.

The change introduces one new parser property logstash_parser.excluded_filters which is an array containing the names of default filters that should not be added to the final logstash.conf. This property is checked in parser_ctl before actually adding each of the filters.

cromega commented 8 years ago

@stoyanr: While I see the point of this change and I think your reasoning is completely valid I would like to hijack this PR and turn it into a discussion.

If nothing comes out of it I will merge it as a temporary solution to your problem.

A while ago we made a decision to turn users away from supplying their own set of filters via injecting a massive string into the deployment manifest. It gave us is a much cleaner release and deployment process at the expense of flexibility to configure the parsers.

Since there weren't many users of the BOSH release we added default filters that served our needs.

We are looking at various ideas to make the parsing rules easier to customise but we are not quite sure yet what route to take.

If you have some useful thoughts please share them with us, we are open to suggestions.

CC @mrdavidlaing, @hannayurkevich

hannayurkevich commented 8 years ago

CC @axelaris

hannayurkevich commented 8 years ago

Hi guys,

I think exclusion can be useful in some cases. But it is not flexible enough.

May be it makes sense to use some kind of filters chain. Lets allow user defining a chain of filters to be included into logstash.conf in the specified order. So user can choose what default filters to include and also add some custom filtering. It looks more flexible to me.

Btw, in the changes I noticed a typo:

<% if not p('logstash_parser.excluded_filters').include?'filters_pre'%> cat ${JOB_DIR}/config/filters_post.conf>> ${JOB_DIR}/config/logstash.conf <% end %>

So if you decide to merge the changes please correct it before merging.

stoyanr commented 8 years ago

Sorry about the typo, it's corrected now. Thanks for noticing!

axelaris commented 8 years ago

Hi guys, I do not want to say that I don't like this PR at all. But I have a strong feeling that we will have more mess than we have at the moment. I'd prefer to develop one holistic solution instead of set of promiscuous patches.

In my vision, we should follow these rules:

  1. Split a monolithic filter set to smaller logical parts (already done, but not enough)
  2. Do not keep any filters in packages (I do not want to build a new release each time when I changed a filter)
  3. In the manifest we should be able to define a set of filters to be included in the final config. I believe that including filters looks more natural and easier to understand than excluding them. When we include - we're building something, when we exclude - we're destroying it ;-)

As for implementation, I guess it should look as an array or hash, where we can have an Id for each element. Please feel free to share your thoughts.

stoyanr commented 8 years ago

Hi guys,

It's an interesting discussion. I thought how something similar could be achieved and I came up with 2 options. @axelaris, I am not addressing your points 1 or 2 above, as I believe these could be addressed later with a different PR, as far as we keep them in mind.

Option 1

Introduce 2 additional properties default_filters_pre and default_filters_post which contain lists of default filters that should be included before and after the custom filters, respectively. This is in fact close to my original PR, except that it's based on inclusion instead of exclusion, and it differentiates between pre and post filters. The spec would look something like:

properties:
  logstash_parser.default_filters_pre:
    default: [filters_pre.conf, logstash-filters-default.conf]
  logstash_parser.default_filters_post:
    default: [filters_post.conf, if_it_looks_like_json.conf, timecop.conf]
  logstash_parser.filters:
    default: ''

Our custom configuration would look something like:

properties:
  logstash_parser:
    default_filters_pre: [filters_pre.conf]
    default_filters_post: [filters_post.conf]
    filters:
      - |
        <%= File.read('templates/filters/filter-default.conf').gsub(/^/, '              ').strip %>

One advantage of this approach is that it's backward compatible with regards to existing configurations.

Option 2

In fact we already have the chain of filters that @hannayurkevich mentioned, it's the filters property. It can already contain either a list of filter files, or a block of text. We could further generalize this property so that it contains the full list of filters, both default and custom. We would need however to introduce a "type" for each filter so that we could distinguish between at least 3 different cases:

The spec would look something like:

properties:
  logstash_parser.filters:
    default:
      - filter: filters_pre.conf
        type: config
      - filter: /var/vcap/packages/logsearch-config/logstash-filters-default.conf
        type: path
      - filter: filters_post.conf
        type: config
      - filter: /var/vcap/packages/logsearch-config/if_it_looks_like_json.conf
        type: path
      - filter: /var/vcap/packages/logsearch-config/timecop.conf
        type: path

Our custom configuration would look something like:

properties:
  logstash_parser:
    filters:
      - filter: filters_pre.conf
        type: config
      - filter: |
          <%= File.read('templates/filters/filter-default.conf').gsub(/^/, '              ').strip %>
        type: text
      - filter: filters_post.conf
        type: config

This approach has the advantage of being much more flexible, and the disadvantage that it's not backward compatible with regards to existing configurations. If backward compatibility is considered important, it could be achieved by using a name different from filters for this property and making sure that the original filters property still works as before.

What do you think?

dpb587 commented 8 years ago

I like @axelaris's perspective of configuring via inclusion and @stoyanr's option 2 idea. An alternative to requiring a type property would be to treat the strings like curl/cli file arguments - when starting with @ it's a file reference, and when starting without an absolute forward slash they're a builtin logsearch config. Example...

properties:
  logstash_parser:
    filters:
      # builtin pre
      - '@filters_pre.conf'
      # manifest-based filter
      - <%= File.read('templates/filters/filter-default.conf').gsub(/^/, '        ').strip %>
      # co-located filters
      - '@/var/vcap/packages/acme-logsearch/production.conf'
      # if globs were desirable
      - '@/var/vcap/packages/*-logsearch/logstash-filters.conf'
      # builtin post
      - '@filters_post' # could probably assume .conf extension on builtins if desirable

Although, it looks like logstash_parser.filters is now a hash or something whose key value isn't used at all which seems weird. So whatever the example actually needs to look like - the single-value @ thing was all I was trying to demonstrate :)

stoyanr commented 8 years ago

Indeed, "the single value @ thing" proposed by @dpb587 saves some typing. logstash_parser.filters is then just an array of strings that are treated differently depending on whether they begin with @ or not.

Should I then correct the PR based on Option 2 with this correction, or are there other proposals?

One unrelated, but important question for us - when do you plan to release your next final release? We would like to get rid of our internal fork and after this PR is merged, we only need a final release that contains all our PRs in order to do it.

mrdavidlaing commented 8 years ago

+1 for configuring via inclusion and @stoyanr's option 2 idea.

I prefer an explicit type param; rather than magic characters; especially since I expect further config options will arise in the future - the logstash roadmap has some form of centralised config on it; so I can see a future with additional config types.

Release 200 has already hopelessly broken backward compatibility; so I think now is a good time to do the same in the interests of a cleaner solution.

I'm not convinced we need a config type - couldn't this also be a path - just to files that that are inside the job rather than the package?

In terms of the final release - hopefully mid March.

@stoyanr - could you amend your PR to implement Option 2.

@cromega - could you review and merge please once Option 2 is done.

stoyanr commented 8 years ago

I amended the PR to implement Option 2 as originally proposed. I preserved the config type for now. Yes, config can also be path, but using config those configuration templates could be added by their names only and ${JOB_DIR} could still be used inside parser_ctl, as before. Please let me know if it fits your expectations.

martinagalabova commented 8 years ago

Hi everyone,

I'm the Product Owner of @stoyanr's team and I'm interested in the status of this pull request. I find the discussion really interesting and the agreed solution quite good. What do you think? Would this change be part of the next final release? Please share your thoughts/comments.

Cheers, Martina

cromega commented 8 years ago

@stoyanr, @martinagalabova: Sorry for the delay, we are having lots of discussion and making plans about the future and direction of this repo.

We like this implementation. I would like to propose a minor change though. Rename config to builtin.

I could do it myself after merging it but I wanted you to be aware of it.

cromega commented 8 years ago

@stoyanr: As a second thought, it would be nice if you changed the yamls under templates/ accordingly. so by running the generate manifest script you get a deployment with a correctly configured logstash.conf. You can exclude the filters you think are unnecessary for a basic configuration.

mrdavidlaing commented 8 years ago

@stoyanr - thank you for this submission - just noticed that we don't seem to have a signed Contributors License Agreement on record for you.

Please could you print & sign the relevant CLA from https://github.com/logsearch/licensing#signing-the-contributors-license-agreement and forward to david.laing@labs.cityindex.com

Thanks!

D

stoyanr commented 8 years ago

@cromega, I adapted the PR as you requested.

@mrdavidlaing, I contacted the appropriate people at SAP regarding the CLA. The corporate CLA should be signed by them within a week or so.

mrdavidlaing commented 8 years ago

@stoyanr - thanks - sorry to have only picked up that we didn't have a CLA from you so late in the merge process :(

mrdavidlaing commented 8 years ago

@stoyanr - any update on that CLA?

stoyanr commented 8 years ago

@mrdavidlaing I think our legal department are working on it. They should contact you, if they haven't done so already.

mrdavidlaing commented 8 years ago

@stoyanr - Thanks for the update - although I haven't heard from your legal department yet. Please can you just double check with them that they are trying to communicate with me via david.laing@labs.cityindex.com (and not david.laing@cityindex.com; which no longer works)

mrdavidlaing commented 8 years ago

@stoyanr Any update on your CLA? I still don't seem to have received anything from your legal department.

Thanks!

stoyanr commented 8 years ago

@mrdavidlaing I learned today that our legal counsel Chris Metcalfe had already contacted you. Could you please follow-up with him on that?

mrdavidlaing commented 8 years ago

@stoyanr - I haven't had any contact with Chris Metcalfe - could you perhaps forward his email address to me at david.laing@lab.cityindex.com (or dlaing@pivotal.io) so I can follow up.

stoyanr commented 8 years ago

@mrdavidlaing I followed up via email, putting Chris in cc, and mentioned all your email addresses.

mrdavidlaing commented 8 years ago

@stoyanr, Just FYI; I'm in contact with Chris; who says there is a problem with SAP signing the Logsearch CLA because of its copyright assignment clause.

I'm working with him to see if there is a way around this.

voelzmo commented 8 years ago

@mrdavidlaing @stoyanr any updates on the topic in the meanwhile?

mrdavidlaing commented 8 years ago

@voelzmo, Sorry, no