Add sequencer feature - Githubissues

PadaKwaak commented 7 years ago

Hi,

I ran into the same issue as mentioned in https://logstash.jira.com/browse/LOGSTASH-192 where my applications logged so many events on the exact same timestamp and then when using Kibana to view those logs, the order is completely lost.

I initially wrote a filter plugin, but since I'm using multiple pipeline workers for Logstash, I cannot guarantee the order when it gets to the filter stage, which is why I ended up modifying the multiline codec plugin which I'm already using to parse Java log files (with stack traces) written to syslog.

The changes included in this pull request are working for my use case, where I'm using the mutliline codec for a file input plugin.

I'm not all that familiar with Ruby or the Logstash plugins, so I'm not sure if there was another way that I can add event sequencing when using the file input plugin and the multiline codec plugin when making use of multiple pipeline workers for Logstash.

Now that I have this, I'm just waiting on Kibana to be able to sort on 2 columns simultaneously (@timestamp and this sequence field)

Any kind of feedback on this feature would be much appreciated!

Kind regards, Chris

karmi commented 7 years ago

Hi @PadaKwaak, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

PadaKwaak commented 7 years ago

Thanks @karmi, I accidentally pushed my previous commit from my company's Email and not my private one. I've now amended my commit by just changing my Email address in the git Author field.

PadaKwaak commented 7 years ago

It seems like issue #46 (or similar) caused the CI to fail.

I've verified on my own PC that when I run the unit tests with bundle exec rspec --seed 58183, both the current master branch and my new feature consistently fail with that seed value.

jordansissel commented 7 years ago

This is a pretty large feature to focus only on the multiline codec. Before we move forward (and then break it later), I'd like to figure out how this fits into the overall model of Logstash.

Is the codec the best place for this?
Would it be more appropriate for the input to handle this?
Should creating an event in Logstash automatically have sequential id?

Solving it in the codec doesn't necessarily solve ordering problems because ordering is a global property across all data, not a a property localized to a single Logstash instance. projects like https://github.com/boundary/flake come to mind for this kind of problem space.

jordansissel commented 7 years ago

@PadaKwaak what do you think? I believe that the codec (and even in just one codec) is the wrong place to put this kind of solution.

PadaKwaak commented 7 years ago

Thanks for the great feedback and questions that you've asked @jordansissel. This has been my first time that I've written something for the ELK stack, so I cannot give suggestions. Instead I can only give my opinion/reason why I chose to make these changes to the multiline codec.

Is the codec the best place for this?

Its the best place I could make the change to in a short time frame. I would've liked to make it its own codec, but it didn't look like I can have multiple codecs for a single (file/syslog) input plugin.

Would it be more appropriate for the input to handle this?

Yes, I think that would definitely be better. Adding it to the filter or output plugins is a bad idea. Adding it to the multiline codec was the easiest way for me to add sequence numbers to my events, both for when I'm using the file input and also when I'm using the syslog input.

Should creating an event in Logstash automatically have sequential id?

Until something better exists to maintain the order of logs, I would definitely benefit from it generate one for me.

I believe that the codec (and even in just one codec) is the wrong place to put this kind of solution.

For sure. Again, this was the best I could do with the limited time and experience I have with ELK stack.

From your questions, my deduction would be that its not a good idea to merge this "feature" of mine into the multiline codec's main branch! I do however still see a benefit in having a (crude) solution like this (included in a fork of the multiline codec) for the while being, until a better solution has been found to maintain the order of existing log entries (when using the ELK stack), without having to update the application(s) to make use of something like https://github.com/boundary/flake

guyboertje commented 7 years ago

@PadaKwaak @jordansissel - I am closing this.

Opened an issue in logstash to discuss this. https://github.com/elastic/logstash/issues/6997

logstash-plugins / logstash-codec-multiline

Add sequencer feature #48