breml commented 3 years ago

After out of band discussions with @magnusbaeck about the topics in #93 , #94 and #95 the idea of a Logstash Filter Verifier (LFV) version 2 was mentioned.

A version 2 of LFV could have the following properties additional to the existing functionality:

Daemon mode (#94)
Multi pipeline support (#93)
Logstash config test coverage (#95)
Drop support for the old test case file format
Updated CLI to support the new features
Drop support for Logstash < 6.0 (because of the lack of pipeline support)

Optional:

Configure logstash to log in JSON format for better observability of the process.
Adjust the logging output as necessary (https://www.elastic.co/guide/en/logstash/7.10/logging.html)
Depending on the implemented solution, maybe the multiline event issue (#39) could be resolved as well.

breml commented 3 years ago

@magnusbaeck

I went through all the version 2.0 related issues and tried to compile a TODO list for the remaining work. This is what I have found so far (this list might not be complete):

Daemon Mode (#94)

[x] Support codec by inheriting the codec set on the input (codec setting in test case files are ignored)
[x] Add config option export_metadata
[x] Add config option metadata_key
[x] Add Filter Mocks
[x] Add testcases.event to exactly specify the event that is used as input
[x] Add testcases.id field (no longer sure, if it is strictly needed)
[ ] Fake time (because the field @timestamp is treated special by Logstash and should be updated using the date filter plugin
[x] Support for pipeline yaml (https://github.com/magnusbaeck/logstash-filter-verifier/pull/103#discussion_r598336319) nested and flat keys.
[x] Add tool to generate ID attributes in Logstash config files (https://github.com/magnusbaeck/logstash-filter-verifier/pull/103#discussion_r602925576)

Multi Pipeline Support (#93)

[x] Add config option input to specify the input, where an event is coming from (e.g. multi pipeline setup with inputs in multiple pipelines)
[x] Add testcases.expected_outputs in order to specify, in which outputs an event is expected to be seen
[x] Generate pipelines.yml for simple cases when only a single pipeline is tested

Logstash Test Coverage (#95)

[ ] Implement this feature (currently completely missing)

Other

[x] Update README.md
[ ] Update documentation / diagrams
[ ] Add useful logging output
[x] Add tests / ensure test coverage

breml commented 3 years ago

I added a helper command to LFV to download and install Logstash. This allowed me to easily test the current state with different versions of Logstash.

So far, the current set of integration tests (as of #106), the following versions of Logstash did work (all test performed on linux-x86_64, I skipped the other patch level versions):

7.7.0
7.8.0
7.9.0
7.10.0
7.11.0
7.12.0

I will have a look at 7.6.x and earlier to figure out, what is going on with these versions.

breml commented 3 years ago

As found out already earlier, the event ordering in the pipeline has been reintroduced in Logstash 7.7.0 (see https://www.elastic.co/guide/en/logstash/current/logstash-7-7-0.html#_improving_the_java_pipeline_execution_engine).

According to the change log entry, the event ordering has been a feature of Logstash before 7.x and has been lost with the switch to the Java based pipeline. The above mentioned change reintroduced this feature. The current code of the LFV daemon mode is highly dependent on the correct ordering in the pipeline. Therefore I see little value in supporting Logstash versions older than 7.7.0.

@magnusbaeck what do you think?

matejzero commented 3 years ago

When 7.x came out, there was a problem with events not being in order. More info can be found in https://github.com/magnusbaeck/logstash-filter-verifier/issues/68 and @magnusbaeck made a fix in https://github.com/magnusbaeck/logstash-filter-verifier/commit/6166b6249256689806e64a1e078c1701307c9438 that is still used.

breml commented 3 years ago

@matejzero Thanks for the hint. I added the mentioned workaround and I successfully executed the integration tests of the current state with some more versions of Logstash:

7.0.0
7.1.0
7.2.0
7.3.2 (but not 7.3.0)
7.4.0
7.5.2 (but not 7.5.0)
7.6.0

Once again, I did not test all the patch releases. Also I did not test with releases older than 7.0.0.

Interestingly, the execution of the tests with version 7.2.0, 7.3.2 have been much faster than with the more recent releases. I measured a difference of up to 50% and even more (7.2.0: 20 seconds, 7.10.0: 43 seconds). Judging from the log output, the difference is mainly caused by a much faster pipeline reload in the older versions.

That being said, I had cases, where the tests failed with 7.10.0 with the new workaround. I did not observe any flaky tests with 7.10.0 before I introduced this workaround.

Tests performed with https://github.com/breml/logstash-filter-verifier/commit/e67bc8e3e8e7562848a752b5bdae20955bc1bf66

breml commented 3 years ago

I made some more progress on the Logstash daemon mode compatibility front. After some tweaking (https://github.com/breml/logstash-filter-verifier/commit/c21939c9ab3da421ad67a4e5254344bbbebd6444), I successfully executed the integration tests with the following Logstash releases:

6.8.15
6.7.2

Some more tests will follow.

breml commented 3 years ago

@magnusbaeck what is your opinion on the Logstash version we need to support with LFV daemon mode? The reason I ask is, that I wanted to test with 6.6.2 and I ran into an issue (https://github.com/elastic/logstash/issues/9316) with the version of Java, that is installed on my development machine. It look like 6.7.x is the oldest version of Logstash, that is compatible with Java > 8.

I have Java 11 on my machine:

$ java --version
openjdk 11.0.10 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

magnusbaeck commented 3 years ago

I think it's okay to drop support for <6.7. Those who currently run incompatible versions of 6.x (like myself; I just haven't bothered upgrading from 6.4) should have no problems upgrading to 6.7. 6.7 still supports Java 8 so we won't force any Java upgrades which might otherwise be a problem for some folks.

breml commented 3 years ago

@magnusbaeck I created some more PR and updated the TODO list in https://github.com/magnusbaeck/logstash-filter-verifier/issues/96#issuecomment-797559730.

breml commented 3 years ago

@magnusbaeck FYI: PR #111, #112, #113 and #114 are ready for review. I started to use a version with all the 4 PR merged with our production Logstash test suite. I removed the "DRAFT:" in the titles.

breml commented 3 years ago

@magnusbaeck I created a first draft for a tool (named mustache) to check, lint and format existing Logstash configurations. You may find the current state in the following PR: https://github.com/breml/logstash-config/pull/9

In #117 I added to possibility to automatically fix/add the missing Logstash plugin ID as well, so we have two different options here:

Provide this functionality directly with LFV
Link the tool mentioned above in the README.md of LFV

What do you think?

magnusbaeck commented 3 years ago

I suggest both :-) I think it'd be very helpful for users to automatically generate IDs as needed, both to ease migration but also to support cases where having IDs on every single filter would be a nuisance. Even if a tool can add them it'll still clutter the configuration when they're not needed for other purposes.

breml commented 3 years ago

I have released mustache, which provides the necessary functionality to add auto generated plugin ID to an existing configuration.

breml commented 3 years ago

I just created a PR #119 for an updated README.md to reflect the changes with Daemon mode and Version 2 in general. I hope this drives us closer to a first "beta" or "release candidate" release of Logstash Filter Verifier with all the new stuff.

I already "dog-food" the new version for our tests and so far, this has been a good experience. I assume, that a "beta" or "release candidate" release would broaden the user base in with this increase feedback in regards to bugs and user experience.

From my point of view, the following issues are resolved:

94
93

Let me know, what you think. What should we do about #96 and #97?

magnusbaeck commented 3 years ago

Feature-wise I think issues 93 and 94 can be closed. I've only done limited testing myself, but now that plugin ids are optional I can do some full-scale testing on the rather elaborate configuration we have at work. If everything works there I feel pretty confident. I've been travelling for a couple of week but will stay home until my vacation runs out at the end of next week so I should have more time than usual to work on this.

Issue 96 is this issue, which I guess we'll close when the final 2.0.0 tag is set? I do think we should get issue 97 (dropping legacy testcase file format) in before 2.0 since the next opportunity would be 3.0. I started working on this a long time ago but got lost in yak shaving like improving the tests for testcase.go. Hopefully those commits will apply reasonably cleanly to the current master branch.

matejzero commented 3 years ago

Just a FYI, I updated LFV to beta1 today for our production tests and it runs without issues. We do use standalone option though.

magnusbaeck commented 3 years ago

Thanks for testing @matejzero!

breml commented 3 years ago

I successfully switched completely to beta.1 daemon mode for our setup (~1850 lines of logstash config accross multiple pipelines, ~80 LFV test cases). Standalone mode does no longer work for my configuration due to the use of features only available in daemon mode (pipeline-to-pipeline communication, usage of filter mock and testing of multiple outputs for a single event).

During my last test, I found a bug in the processing of the Logstash configuration, which I will need to fix in https://github.com/breml/logstash-config The bug causes a config like this:

filter {
  mutate {
    add_field => {
      "field1" => "string with
multiple
lines"
    }
  }
}

to be altered to (changing the indentation of the multi-line string):

filter {
  mutate {
    add_field => {
      "field1" => "string with
      multiple
      lines"
    }
  }
}

nicolasreich commented 3 years ago

I've started experimenting with 2.0. It looks great, thank you for the amazing work! I do have a few questions/remarks:

Is there a way to test a pipeline that has a pipeline input or output in daemon mode? I am getting an error (e.g. logstash-filter-verifier: error: expect the Logstash config to have at least 1 input and 1 output, got 0 inputs and 1 outputs) when I try to.
About this error, it seems to appear whenever there's a file-related issue, for example if the location in config.path doesn't contain a logstash configuration, which is a bit confusing.
It seems using absolute paths in the pipelines.yml file doesn't work. I have worked around this by setting --pipeline-base /, which functionally is fine, but is somewhat hacky.
It seems -add-missing-id adds IDs even to filters that already have an ID. I have a rule with ID beats_input_plain_5044_tls; when I run the test with the flag, the test crashes, and I can see this in the Logstash logs: "Attempted to send event to '__lfv_input_pXo4QHj9_beats_input_plain_5044_tls' but that address was unavailable.

EDIT: added 1 point

breml commented 2 years ago

Hi @nicolasreich

I've started experimenting with 2.0. It looks great, thank you for the amazing work! I do have a few questions/remarks:

Thanks for your feedback and sorry for being slow with my reply.

Is there a way to test a pipeline that has a pipeline input or output in daemon mode? I am getting an error (e.g. logstash-filter-verifier: error: expect the Logstash config to have at least 1 input and 1 output, got 0 inputs and 1 outputs) when I try to.

This is currently not supported. In our case, we always the whole Logstash configuration with all the pipelines in place, so there is no configuration with pipelines, that have pipeline inputs or outputs, that are not linked to other pipelines. So what I mean by this is:

given a Logstash configuration with a pipeline A, that has a pipeline output to a pipeline B, I would always test the two pipelines (A and B) in combination and never pipeline A or B in isolation.

Therefore, LFV v2 currently always expects a Logstash configuration that has at least 1 non-pipeline input and 1 non-pipeline output.

Maybe if you describe your use case, we can figure out, if we need to extend LFV v2 to support testing of pipelines which have a pipeline input or output.

About this error, it seems to appear whenever there's a file-related issue, for example if the location in config.path doesn't contain a logstash configuration, which is a bit confusing.

Agree, the error handling and reporting can (and should) be improved. I suggest, you create separate issues with minimal configuration examples, that produce an error that is confusing as well as a suggestion, what you would expect as error message in this case.

It seems using absolute paths in the pipelines.yml file doesn't work. I have worked around this by setting --pipeline-base /, which functionally is fine, but is somewhat hacky.

This sounds like a fixable bug. I guess, if the paths in pipelines.yml are absolute (that is, if they start with /), we should use them as-is. (@magnusbaeck what do you think?)

It seems -add-missing-id adds IDs even to filters that already have an ID. I have a rule with ID beats_input_plain_5044_tls; when I run the test with the flag, the test crashes, and I can see this in the Logstash logs: "Attempted to send event to '__lfv_input_pXo4QHj9_beats_input_plain_5044_tls' but that address was unavailable.

I need to check this. If it is the case, this clearly is a bug.

Edit: fix formatting

breml commented 2 years ago

@nicolasreich Just to be sure, in your last item, you write, that your rule already has an ID (upper case). Does your config look like this:

input {
  beats {
    ID => beats_input_plain_5044_tls
    ...
  }
}

or do you have the ID attribute in lower case like this id (lower case)?

breml commented 2 years ago

In regards to the pipeline base path, the current logic works as follows:

default value for --pipeline-base is "" (empty string)
if the pipeline base path is not an absolute path (which the default is not), the base path is prefixed with the current working directory.
for the config paths found in the pipelines.yml, the base path is always used as prefix (even if the config paths are absolute).

After some thinking, I propose to change the logic as follows:

default value for --pipeline-base stays "" (empty string)
if the config paths found in the pipelines.yml are absolute paths, the pipeline base path setting is ignored.
else (if the config paths found in the pipelines.yml are relative paths)
- if the pipeline base path is not set (this is, if it is the empty string ""), the absolute path of the pipelines.yml file is used as the pipeline base path (the assumption is, that relative paths in the pipelines.yml are relative to the location of the file it self).
- if the pipeline base path is relative, it is converted to an absolute path by prefixing it with the current working directory.
- if the pipeline base path is absolute, it is used as is.
- the resulting absolute pipeline base path is prefixed to the relative config paths found in the pipelines.yml

Would this make sense?

magnusbaeck commented 2 years ago

I think it makes great sense and would match my expectations as a user. Thanks for thinking through all the cases.

nicolasreich commented 2 years ago

Hey @breml, thanks for the answers!

Therefore, LFV v2 currently always expects a Logstash configuration that has at least 1 non-pipeline input and 1 non-pipeline output.

Maybe if you describe your use case, we can figure out, if we need to extend LFV v2 to support testing of pipelines which have a pipeline input or output.

Thanks for the clarification. We have tried to make pipelines as modular as possible, hence we have pipelines that get combined differently; e.g assuming we have pipelines A, B, C, D, E maybe sometimes we do A->B->D; sometimes A->C->D; and sometimes A->E. Currently we test each of A, B, C, D, E individually with LFV 1.x; so we have extensive test suites for each pipeline, and we would like to add more general tests for the whole setup, that mostly test that the pipelines connections and routing work fine. The issue with having tests only for whole setups is that we will need to repeat our tests: in A->B->D, we will need to test that A, B, and D work exactly as expected; while in A->C->D, we will need to test that A, C, and D work exactly as expected.

Now, we can probably find a way to work around this if you think allowing this would be detrimental to LFV in some way, or if it's complex to implement; but it would certainly be useful for us.

Agree, the error handling and reporting can (and should) be improved. I suggest, you create separate issues with minimal configuration examples, that produce an error that is confusing as well as a suggestion, what you would expect as error message in this case

Will do!

@nicolasreich Just to be sure, in your last item, you write, that your rule already has an ID (upper case). Does your config look like this:

input { beats { ID => beats_input_plain_5044_tls ... } }

or do you have the ID attribute in lower case like this id (lower case)?

The config looks like this:

input {
  beats {
    port => 5044
    id => "beats_input_plain_5044_tls"
    ssl => "true"
    ...
  }
}

Sorry it wasn't clear.

Would this make sense?

Yes your proposal for pipeline-base makes perfect sense, and would indeed match my expectations. Thanks!

EDIT: about the id issue, I may have misinterpreted the output of the logstash logs, I'm not sure there's actually a bug with LFV; I'll let you know when I'm more sure, sorry about the confusion.

breml commented 2 years ago

Thanks for the clarification. We have tried to make pipelines as modular as possible, hence we have pipelines that get combined differently; e.g assuming we have pipelines A, B, C, D, E maybe sometimes we do A->B->D; sometimes A->C->D; and sometimes A->E. Currently we test each of A, B, C, D, E individually with LFV 1.x; so we have extensive test suites for each pipeline, and we would like to add more general tests for the whole setup, that mostly test that the pipelines connections and routing work fine. The issue with having tests only for whole setups is that we will need to repeat our tests: in A->B->D, we will need to test that A, B, and D work exactly as expected; while in A->C->D, we will need to test that A, C, and D work exactly as expected.

OK, I get your use case and I think that this is a valid (and maybe even encouraged) way to use and combine Logstash pipelines.

I have to think about possible implementations in LFV 2.0 make this work. One idea, that comes to my mind is to extend the filter mock feature such that it is not only able to replace filters with "mock" but also replace inputs and outputs. This would allow one to take any of your pipelines, replace the pipeline input with e.g. stdin input and the filter output with e.g. stdout output. This "altered" config should then work with the --logstash-config flag, which also removes the need for a pipelines.yml for each test scenario.

But this is only a first idea, maybe there are other solutions.

jgough commented 2 years ago

Thank you very much for the hard work on this, it is much appreciated! Is there scope for a Beta 2 release once https://github.com/magnusbaeck/logstash-filter-verifier/pull/130 gets merged since there have been a few bug fixes?

breml commented 2 years ago

I would like to see #124 be merged as well, before we release Beta 2.

magnusbaeck / logstash-filter-verifier

Logstash Filter Verifier version 2.0 #96

Daemon Mode (#94)

Multi Pipeline Support (#93)

Logstash Test Coverage (#95)

Other

94

93