magnusbaeck / logstash-filter-verifier

Apache License 2.0
192 stars 27 forks source link

Ignore ordering of events when multiple events are expected #90

Closed jgough closed 2 years ago

jgough commented 3 years ago

If a test produces multiple events then the order in which they are produced is deterministic. This makes it possible to create tests that return multiple events.

However if the logstash config is changed though then the events can change order and then the test fails, even though the exact same events are being produced. Sometimes just trivial changes can flip the order of events from A,B to B,A. See https://github.com/magnusbaeck/logstash-filter-verifier/issues/89

Since the order of events is a side-effect of how Logstash processess the pipeline, event ordering should be ignored and the events should be compared.

This pull request sorts the output events alphabetically and therefore should mean that event ordering is ignored.

Given events are sorted alphabetically, it is technically possible for the code to not know which event it should be comparing with which. In this case it could compare Expected A with Output B and vice versa and the diff may be unexpected. That said, this is also currently an issue with the code where you do have to guess which order Logstash will output of the events when writing the test. It is fairly trivial for an end user to work out if this has happened, and I guess most users won't be taking advantage of multiple output events per pipeline. A smarter solution to that could be to find the closest match for each but then we're into heuristics and probably overkill for the fact of saying that the test failed.

breml commented 3 years ago

Is this change a (potentially) breaking change for all existing test suits, because the results need to be ordered correctly in the test case files?

jgough commented 3 years ago

Quite the opposite. This removes the need for tests to be ordered correctly. I have run the tests with make test and all pass successfully.

jgough commented 3 years ago

To clarify - it sorts the expected AND the outputted events so both should match regardless of the order of either.

jgough commented 2 years ago

@magnusbaeck Still seeing the issue this resolves all the time with certain pipelines. Do my changes fix the issues you raised? Do I need to try to get more evidence of the underlying issue?

jgough commented 2 years ago

Hmm, I am concerned this doesn't actually cover all cases as I have found another case where this is not working. Investigating further.

jgough commented 2 years ago

So this seems to work with one just one test, but doesn't currently seem to work with two tests like this:

testcases:
  - input:
    - |
      [This is the message]
   expected:
     - message: This is the message
  - input:
    - |
      [output1,output2]
   expected:
     - message: output2
     - message: output1

The 2nd one is failing as the two expected outputs are not matching as they are in the wrong order. Unsure exactly why this is but I am investigating.

jgough commented 2 years ago

Apologies for keeping adding comments, On second thoughts I think the code is working correctly but it seems I was encountering https://github.com/magnusbaeck/logstash-filter-verifier/pull/90#pullrequestreview-530603052 where the wrong events were being compared. It was quite confusing at first but I think short of doing a levenshtein distance between all pairs it will be almost impossible to figure out which events to compare with which.

I think I've spotted another issue as the events are compared before removeFields is called, so I need to bring this further forwards. I will update the code tomorrow with this change.

jgough commented 2 years ago

So on further thoughts I don't think this will work quite as I expected it to

From what I can tell, all of the expected outputs from all of the testcases in a file are mixed together, not just the ones in the output. So for these testcases:

testcases:
  - input:
    - |
      [This is the message]
   expected:
     - message: This is the message
  - input:
    - |
      [output1,output2]
   expected:
     - message: output2
     - message: output1

All three output messages will be sorted together which could cause major confusion. I was hoping to sort and compare on the individual testcase level (sorting 1 message for the first test, 2 messages for the second) but it seems that this is not the way it works.

Going to have to think about this a bit more. I probably need to demonstrate the underlying issue here in a reproduceable succinct manner.