magnusbaeck / logstash-filter-verifier

Apache License 2.0
192 stars 27 forks source link

Tests rearranging themselves #89

Open Mythirion opened 3 years ago

Mythirion commented 3 years ago

Hi,

We're encountering an issue sometimes where the LFV outputs decide to rearrange themselves on different runs, which causes our GitLab CI to fail. When I say LFV outputs, I'm talking about when we use a clone{} filter. Upon rerunning it the tests will run in the right order and sometimes it'll pass.

An example using pseudo logic:

Test Input:             Foo
Expected Output:        Bar
Second Output:          Baz (Baz is the Bar output but transformed to be entity-centric)
Actual Output in LFV:   Baz
Second Output:          Bar

So then the test will fail, because instead of expecting Bar to be the first results returned, we get Baz and vice versa.

Is there an end-user way of ordering these events that we should be following? Or is there a way that we can adjust the code to sort the test outputs in a specific way etc.? Let me know if I need to explain the issue differently :)

Cheers, Aaron

breml commented 3 years ago

The first thing, that comes to my mind is pipeline.workers. Have you tried to set this value explicitly to 1?

breml commented 3 years ago

And pipeline.ordered as well. See: https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html

Mythirion commented 3 years ago

And pipeline.ordered as well. See: https://www.elastic.co/guide/en/logstash/current/logstash-settings-file.html

Will take a look! Thank you :)

Mythirion commented 3 years ago

Hi,

I've tried running LS 7.9 and LFV using --logstash-arg=--pipeline.workers --logstash-arg 1 --logstash-arg=--pipeline.ordered --logstash-arg true and I still get the same issue. Are there any issues with the above?

breml commented 3 years ago

I am not sure, if there is a solution for this problem, if pipeline.workers and pipeline.ordered do not help. LFV processes the events in the order they are returned from Logstash.

Can you elaborate why you need the clone filter?

Mythirion commented 3 years ago

We need the clone filter to, for example:

Take one event, if it contains a certain event type, clone it and then run filters on it to transform it, ready to insert it into a different index. The original event is still preserved and therefore goes into the original index - both are equally useful documents.

Another, for example, is to run some Elasticsearch filtering on a cloned event (as well as pruning fields etc.) to then insert into an entity-centric index. This for example, can be the last state of each user we've seen in our database that we then can aggregate on.

We of course, need to test that this cloning and processing works, and continues to work, when changes are made.

jgough commented 2 years ago

I am still seeing this with the use of clone and split filters where entire test suites have to be rewritten on every config change. This is very difficult to maintain with a large test suite.

I've tried to reproduce a minimal test case that illustrates the problem but I've been unable to do so with a small config that I am able to share.

jgough commented 2 years ago

I believe this issue is a dupe of #150