ukncsc / lme

Logging Made Easy
Apache License 2.0
707 stars 115 forks source link

[BUG] No shards indexed after 25th Feb 23:59:990 #132

Closed edmitchellVS closed 2 years ago

edmitchellVS commented 2 years ago

Describe the issue A clear and concise description of what the issue is.

I have ran through the upgrade process recently to address the L4J version issue and all was running fine. I could see data no problem, I updated ubuntu on Wed 23rd and the server was restarted on Friday 25th am. All good so far, checked LME on the Following Wed (2nd March) and no data showing in the discovery section for last 15 mins. I increased the time filter until it hit 25th Feb and could the last log was for 25th Feb @ 23:59:990. I had a look at the shards and I can see the last of the indices was indeed 25th Feb. However it seems WinLogBeats on the event forwarder is still sending logs to LME error free but i am not sure as to how to check the logs on the LME server. Is this perhaps my not following the upgrade process correctly from 03 --> 04, the L4J upgrade fix or something else?

To Reproduce Steps to reproduce the behavior:

  1. Go to 'Analytics / Discover'
  2. Click on 'refresh'
  3. Scroll down to '....'
  4. See error - No results match your search criteria, Expand your time range Try searching over a longer period of time.
  5. Expand this to 25th Feb and I can see data

Expected behavior see any events for the last 15 mins

Screenshots If applicable, add screenshots to help explain your problem.

Windows Event Collector (please complete the following information):

Linux Server (please complete the following information):

Additional context Add any other context about the problem here.

adam-ncc commented 2 years ago

Hey @edmitchellVS, if you were getting data in with no issues till this point it seems unlikely to be an issue with the v0.4 upgrade, and the minor updates shouldn't cause any compatibility issues I believe. It may be possible you've run out of disk space as discussed here, would you be able to check if this was the issue?

It would also be useful if you could post the logs (minus any sensitive information) from logstash/kibana/elastic using the following commands, which may help us to diagnose the problem:

sudo docker service logs lme_elasticsearch --tail 20 --timestamps
sudo docker service logs lme_kibana --tail 20 --timestamps
sudo docker service logs lme_logstash --tail 20 --timestamps

Thanks

edmitchellVS commented 2 years ago

Hi Adam,

Many thanks for this, I have ran the commad above and I can see where the error is.

:response=>{"index"=>{"_index"=>"winlogbeat-09.03.2022", "_type"=>"_doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"validation_exception", "reason"=>"Validation Failed: 1: this action would add [4] shards, but this cluster currently has [1000]/[1000] maximum normal shards open;"}}}}

I have changed the data retention to 180 days and this now seems to have fixed the issue... May many many thanks for this!

Sorry I also forgot to mention another error message i was getting...

"4 of 708 shards failed The data you are seeing might be incomplete or wrong."

This happens when going to the user investigator tab in the security dash board. Do you think this will fix that or do I need to do something else?

Thanks again

Ed

edmitchellVS commented 2 years ago

Issue now completely resolved. Thanks again for your help with this one 👍