Open breml opened 3 years ago
@magnusbaeck
I went through all the version 2.0 related issues and tried to compile a TODO list for the remaining work. This is what I have found so far (this list might not be complete):
export_metadata
metadata_key
testcases.event
to exactly specify the event that is used as inputtestcases.id
field (no longer sure, if it is strictly needed)@timestamp
is treated special by Logstash and should be updated using the date
filter plugininput
to specify the input, where an event is coming from (e.g. multi pipeline setup with inputs in multiple pipelines)testcases.expected_outputs
in order to specify, in which outputs an event is expected to be seenpipelines.yml
for simple cases when only a single pipeline is testedI added a helper command to LFV to download and install Logstash. This allowed me to easily test the current state with different versions of Logstash.
So far, the current set of integration tests (as of #106), the following versions of Logstash did work (all test performed on linux-x86_64, I skipped the other patch level versions):
I will have a look at 7.6.x and earlier to figure out, what is going on with these versions.
As found out already earlier, the event ordering in the pipeline has been reintroduced in Logstash 7.7.0 (see https://www.elastic.co/guide/en/logstash/current/logstash-7-7-0.html#_improving_the_java_pipeline_execution_engine).
According to the change log entry, the event ordering has been a feature of Logstash before 7.x and has been lost with the switch to the Java based pipeline. The above mentioned change reintroduced this feature. The current code of the LFV daemon mode is highly dependent on the correct ordering in the pipeline. Therefore I see little value in supporting Logstash versions older than 7.7.0.
@magnusbaeck what do you think?
When 7.x came out, there was a problem with events not being in order. More info can be found in https://github.com/magnusbaeck/logstash-filter-verifier/issues/68 and @magnusbaeck made a fix in https://github.com/magnusbaeck/logstash-filter-verifier/commit/6166b6249256689806e64a1e078c1701307c9438 that is still used.
@matejzero Thanks for the hint. I added the mentioned workaround and I successfully executed the integration tests of the current state with some more versions of Logstash:
Once again, I did not test all the patch releases. Also I did not test with releases older than 7.0.0.
Interestingly, the execution of the tests with version 7.2.0, 7.3.2 have been much faster than with the more recent releases. I measured a difference of up to 50% and even more (7.2.0: 20 seconds, 7.10.0: 43 seconds). Judging from the log output, the difference is mainly caused by a much faster pipeline reload in the older versions.
That being said, I had cases, where the tests failed with 7.10.0 with the new workaround. I did not observe any flaky tests with 7.10.0 before I introduced this workaround.
Tests performed with https://github.com/breml/logstash-filter-verifier/commit/e67bc8e3e8e7562848a752b5bdae20955bc1bf66
I made some more progress on the Logstash daemon mode compatibility front. After some tweaking (https://github.com/breml/logstash-filter-verifier/commit/c21939c9ab3da421ad67a4e5254344bbbebd6444), I successfully executed the integration tests with the following Logstash releases:
Some more tests will follow.
@magnusbaeck what is your opinion on the Logstash version we need to support with LFV daemon mode? The reason I ask is, that I wanted to test with 6.6.2 and I ran into an issue (https://github.com/elastic/logstash/issues/9316) with the version of Java, that is installed on my development machine. It look like 6.7.x is the oldest version of Logstash, that is compatible with Java > 8.
I have Java 11 on my machine:
$ java --version
openjdk 11.0.10 2021-01-19
OpenJDK Runtime Environment (build 11.0.10+9-Ubuntu-0ubuntu1.20.04)
OpenJDK 64-Bit Server VM (build 11.0.10+9-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)
Update: Related issues: https://github.com/elastic/logstash/issues/9345 and https://github.com/elastic/logstash/pull/10279
I think it's okay to drop support for <6.7. Those who currently run incompatible versions of 6.x (like myself; I just haven't bothered upgrading from 6.4) should have no problems upgrading to 6.7. 6.7 still supports Java 8 so we won't force any Java upgrades which might otherwise be a problem for some folks.
@magnusbaeck I created some more PR and updated the TODO list in https://github.com/magnusbaeck/logstash-filter-verifier/issues/96#issuecomment-797559730.
@magnusbaeck FYI: PR #111, #112, #113 and #114 are ready for review. I started to use a version with all the 4 PR merged with our production Logstash test suite. I removed the "DRAFT:" in the titles.
@magnusbaeck I created a first draft for a tool (named mustache
) to check, lint and format existing Logstash configurations. You may find the current state in the following PR: https://github.com/breml/logstash-config/pull/9
In #117 I added to possibility to automatically fix/add the missing Logstash plugin ID as well, so we have two different options here:
What do you think?
I suggest both :-) I think it'd be very helpful for users to automatically generate IDs as needed, both to ease migration but also to support cases where having IDs on every single filter would be a nuisance. Even if a tool can add them it'll still clutter the configuration when they're not needed for other purposes.
I have released mustache
, which provides the necessary functionality to add auto generated plugin ID to an existing configuration.
I just created a PR #119 for an updated README.md to reflect the changes with Daemon mode and Version 2 in general. I hope this drives us closer to a first "beta" or "release candidate" release of Logstash Filter Verifier with all the new stuff.
I already "dog-food" the new version for our tests and so far, this has been a good experience. I assume, that a "beta" or "release candidate" release would broaden the user base in with this increase feedback in regards to bugs and user experience.
From my point of view, the following issues are resolved:
Let me know, what you think. What should we do about #96 and #97?
Feature-wise I think issues 93 and 94 can be closed. I've only done limited testing myself, but now that plugin ids are optional I can do some full-scale testing on the rather elaborate configuration we have at work. If everything works there I feel pretty confident. I've been travelling for a couple of week but will stay home until my vacation runs out at the end of next week so I should have more time than usual to work on this.
Issue 96 is this issue, which I guess we'll close when the final 2.0.0 tag is set? I do think we should get issue 97 (dropping legacy testcase file format) in before 2.0 since the next opportunity would be 3.0. I started working on this a long time ago but got lost in yak shaving like improving the tests for testcase.go. Hopefully those commits will apply reasonably cleanly to the current master branch.
Just a FYI, I updated LFV to beta1 today for our production tests and it runs without issues. We do use standalone option though.
Thanks for testing @matejzero!
I successfully switched completely to beta.1 daemon mode for our setup (~1850 lines of logstash config accross multiple pipelines, ~80 LFV test cases). Standalone mode does no longer work for my configuration due to the use of features only available in daemon mode (pipeline-to-pipeline communication, usage of filter mock and testing of multiple outputs for a single event).
During my last test, I found a bug in the processing of the Logstash configuration, which I will need to fix in https://github.com/breml/logstash-config The bug causes a config like this:
filter {
mutate {
add_field => {
"field1" => "string with
multiple
lines"
}
}
}
to be altered to (changing the indentation of the multi-line string):
filter {
mutate {
add_field => {
"field1" => "string with
multiple
lines"
}
}
}
I've started experimenting with 2.0. It looks great, thank you for the amazing work! I do have a few questions/remarks:
logstash-filter-verifier: error: expect the Logstash config to have at least 1 input and 1 output, got 0 inputs and 1 outputs
) when I try to.config.path
doesn't contain a logstash configuration, which is a bit confusing.pipelines.yml
file doesn't work. I have worked around this by setting --pipeline-base /
, which functionally is fine, but is somewhat hacky.-add-missing-id
adds IDs even to filters that already have an ID. I have a rule with ID beats_input_plain_5044_tls
; when I run the test with the flag, the test crashes, and I can see this in the Logstash logs: "Attempted to send event to '__lfv_input_pXo4QHj9_beats_input_plain_5044_tls' but that address was unavailable.
EDIT: added 1 point
Hi @nicolasreich
I've started experimenting with 2.0. It looks great, thank you for the amazing work! I do have a few questions/remarks:
Thanks for your feedback and sorry for being slow with my reply.
- Is there a way to test a pipeline that has a pipeline input or output in daemon mode? I am getting an error (e.g.
logstash-filter-verifier: error: expect the Logstash config to have at least 1 input and 1 output, got 0 inputs and 1 outputs
) when I try to.
This is currently not supported. In our case, we always the whole Logstash configuration with all the pipelines in place, so there is no configuration with pipelines, that have pipeline inputs or outputs, that are not linked to other pipelines. So what I mean by this is:
given a Logstash configuration with a pipeline A, that has a pipeline output to a pipeline B, I would always test the two pipelines (A and B) in combination and never pipeline A or B in isolation.
Therefore, LFV v2 currently always expects a Logstash configuration that has at least 1 non-pipeline input and 1 non-pipeline output.
Maybe if you describe your use case, we can figure out, if we need to extend LFV v2 to support testing of pipelines which have a pipeline input or output.
- About this error, it seems to appear whenever there's a file-related issue, for example if the location in
config.path
doesn't contain a logstash configuration, which is a bit confusing.
Agree, the error handling and reporting can (and should) be improved. I suggest, you create separate issues with minimal configuration examples, that produce an error that is confusing as well as a suggestion, what you would expect as error message in this case.
- It seems using absolute paths in the
pipelines.yml
file doesn't work. I have worked around this by setting--pipeline-base /
, which functionally is fine, but is somewhat hacky.
This sounds like a fixable bug. I guess, if the paths in pipelines.yml
are absolute (that is, if they start with /
), we should use them as-is. (@magnusbaeck what do you think?)
- It seems
-add-missing-id
adds IDs even to filters that already have an ID. I have a rule with IDbeats_input_plain_5044_tls
; when I run the test with the flag, the test crashes, and I can see this in the Logstash logs:"Attempted to send event to '__lfv_input_pXo4QHj9_beats_input_plain_5044_tls' but that address was unavailable.
I need to check this. If it is the case, this clearly is a bug.
Edit: fix formatting
@nicolasreich Just to be sure, in your last item, you write, that your rule already has an ID
(upper case). Does your config look like this:
input {
beats {
ID => beats_input_plain_5044_tls
...
}
}
or do you have the ID attribute in lower case like this id
(lower case)?
In regards to the pipeline base path, the current logic works as follows:
--pipeline-base
is ""
(empty string)pipelines.yml
, the base path is always used as prefix (even if the config paths are absolute).After some thinking, I propose to change the logic as follows:
--pipeline-base
stays ""
(empty string)pipelines.yml
are absolute paths, the pipeline base path setting is ignored.pipelines.yml
are relative paths)
""
), the absolute path of the pipelines.yml
file is used as the pipeline base path (the assumption is, that relative paths in the pipelines.yml
are relative to the location of the file it self).pipelines.yml
Would this make sense?
I think it makes great sense and would match my expectations as a user. Thanks for thinking through all the cases.
Hey @breml, thanks for the answers!
Therefore, LFV v2 currently always expects a Logstash configuration that has at least 1 non-pipeline input and 1 non-pipeline output.
Maybe if you describe your use case, we can figure out, if we need to extend LFV v2 to support testing of pipelines which have a pipeline input or output.
Thanks for the clarification. We have tried to make pipelines as modular as possible, hence we have pipelines that get combined differently; e.g assuming we have pipelines A, B, C, D, E maybe sometimes we do A->B->D; sometimes A->C->D; and sometimes A->E. Currently we test each of A, B, C, D, E individually with LFV 1.x; so we have extensive test suites for each pipeline, and we would like to add more general tests for the whole setup, that mostly test that the pipelines connections and routing work fine. The issue with having tests only for whole setups is that we will need to repeat our tests: in A->B->D, we will need to test that A, B, and D work exactly as expected; while in A->C->D, we will need to test that A, C, and D work exactly as expected.
Now, we can probably find a way to work around this if you think allowing this would be detrimental to LFV in some way, or if it's complex to implement; but it would certainly be useful for us.
Agree, the error handling and reporting can (and should) be improved. I suggest, you create separate issues with minimal configuration examples, that produce an error that is confusing as well as a suggestion, what you would expect as error message in this case
Will do!
@nicolasreich Just to be sure, in your last item, you write, that your rule already has an ID (upper case). Does your config look like this:
input { beats { ID => beats_input_plain_5044_tls ... } }
or do you have the ID attribute in lower case like this id (lower case)?
The config looks like this:
input {
beats {
port => 5044
id => "beats_input_plain_5044_tls"
ssl => "true"
...
}
}
Sorry it wasn't clear.
Would this make sense?
Yes your proposal for pipeline-base
makes perfect sense, and would indeed match my expectations. Thanks!
EDIT: about the id issue, I may have misinterpreted the output of the logstash logs, I'm not sure there's actually a bug with LFV; I'll let you know when I'm more sure, sorry about the confusion.
Thanks for the clarification. We have tried to make pipelines as modular as possible, hence we have pipelines that get combined differently; e.g assuming we have pipelines A, B, C, D, E maybe sometimes we do A->B->D; sometimes A->C->D; and sometimes A->E. Currently we test each of A, B, C, D, E individually with LFV 1.x; so we have extensive test suites for each pipeline, and we would like to add more general tests for the whole setup, that mostly test that the pipelines connections and routing work fine. The issue with having tests only for whole setups is that we will need to repeat our tests: in A->B->D, we will need to test that A, B, and D work exactly as expected; while in A->C->D, we will need to test that A, C, and D work exactly as expected.
OK, I get your use case and I think that this is a valid (and maybe even encouraged) way to use and combine Logstash pipelines.
I have to think about possible implementations in LFV 2.0 make this work. One idea, that comes to my mind is to extend the filter mock feature such that it is not only able to replace filters with "mock" but also replace inputs and outputs.
This would allow one to take any of your pipelines, replace the pipeline input with e.g. stdin input and the filter output with e.g. stdout output. This "altered" config should then work with the --logstash-config
flag, which also removes the need for a pipelines.yml
for each test scenario.
But this is only a first idea, maybe there are other solutions.
Thank you very much for the hard work on this, it is much appreciated! Is there scope for a Beta 2 release once https://github.com/magnusbaeck/logstash-filter-verifier/pull/130 gets merged since there have been a few bug fixes?
I would like to see #124 be merged as well, before we release Beta 2.
After out of band discussions with @magnusbaeck about the topics in #93 , #94 and #95 the idea of a Logstash Filter Verifier (LFV) version 2 was mentioned.
A version 2 of LFV could have the following properties additional to the existing functionality:
Optional: