rfoltyns / log4j2-elasticsearch

Log4j2 Elasticsearch Appender plugins
Apache License 2.0
174 stars 46 forks source link

Usage of pattern yyyy-ww does not change index every 7 days #50

Open vesselinn opened 4 years ago

vesselinn commented 4 years ago

Description Hello,

For couple of months we are using your app to send log events to ELK, but for some reason it is not changing weekly index. What we see actually is that the index stay for more than 10 days and when it change the index it neither Sunday nor Monday.

Configuration `<?xml version="1.0" encoding="UTF-8"?>

`

Runtime (please complete the following information):

Additional context Trying with daily or hourly pattern it is ok.

rfoltyns commented 4 years ago

Hi @vesselinn. I can't reproduce it. RollingIndexName (actually RollingIndexNameFormatter class) takes the time as-is from LogEvent.getTimeMillis(). I've just unit tested the weekly rollover. 2 events everyday: 1 daily, second one with random time every day just to try to 'confuse' the formatter, from January 2020, 3 years ahead. It rolled over exactly every 7 days, calculating index name correctly every day. Could you provide me with a test that can reproduce this behavior?

Are the logs in your indices have correct timeMillis values? Given that you can create an index pattern per index, you should be able to verify that documents are indexed for more than a week in a single index.

vesselinn commented 4 years ago

Hi @rfoltyns. Let me give you examples from our kibana:

Date: 2020-06-22 14:59:07.986 timeMillis: 1592827147987

Date: 2020-07-04 23:59:13.271 timeMillis: 1593896353271

Date: 2020-07-05 00:00:13.572 timeMillis: 1593896413572

Thanks

rfoltyns commented 4 years ago

How consistently does that happen? Does it happen right after the deployment or after few weeks of JVM running? How many documents do you send to ES per day? Do you send logs immediately after they're produced or do you replay/re-process them? Do you have any post processing defined in ES? Pipelines? Do you have any index management defined in ES? Can you reproduce this in a test?

vesselinn commented 4 years ago

Hi, When we noticed the problem we left one server (JVM) to run without restart for 3 weeks (you can see the result). image On this picture you can see how many logs we send to ELK. Actually we use java util logging to produce logs, but they are handled and transformed to log4j2 logs and send to ELK via elasticsearch asynch appender (shown in my first post). We don't have mechanism to re-process them. I've also unit tested the behavior of weekly rollover and it was running normally. Could it be a problem with WebLogic server as all our apps and the Handler are deployed on it? When we run WebLogic all log indexes are in correct format and week of the year, but they are not changing on time - like Sunday 00:00 or Monday 00:00. We don't have any simultaneous scenario for changing the index. Also we don't have post processing and pipelines defined in ES, only mapping template for types of the data.

vesselinn commented 4 years ago

I've also found log4j2 ticket mentioned in org.apache.logging.log4j.core.appender.rolling.PatternProcessor.getNextTime() / log4j-core-2.11.1jar. Could it be something related to the issue?

rfoltyns commented 4 years ago

Given a fairly constant daily rate, sth apparently went wrong on 22nd June. Could send me the output of this query?

curl http://somehost:9200/_cat/indices/log4j.2020-*?v=&s=index

Ticket you mentioned describes issues on year change, but maybe it's worth looking into it.

BTW, If you're transforming the logs from java logging, you can actually use AsyncBatchDelivery directly, this way you'll have full control of the target index name (and maybe discover an issue with it a bit easier). Also, it might flesh out some timestamp issues.

cuneytcalishkan commented 4 years ago

Hello @rfoltyns ,

We are also using this project and we've come across the same problem.

Here is a unit test to reproduce the issue and I guess I have found why the problem happens. The nextRolloverTime is calculated at construction time of RollingIndexNameFormatter and if this is the only instance that is used for each message before logging, we end up with this error. If we create a new instance each time, then the index is calculated correctly. Please take a look at the unit test code below and let me know if I could help to solve this issue or if this test makes sense at all.

@Test
  public void testWeeklyRollingIndexPattern() {
    Long sundayBeforeMidnight = LocalDateTime.of(2020, 9, 20, 23, 59, 59)
                                             .atZone(ZoneId.of("GMT"))
                                             .toInstant().toEpochMilli();
    RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
    when(builder.getInitTimeInMillis()).thenReturn(sundayBeforeMidnight);
    builder.withIndexName("index");
    builder.withPattern("yyyy-ww");
    builder.withSeparator("-");
    builder.withTimeZone("GMT");
    RollingIndexNameFormatter formatter = builder.build();

    LogEvent logEvent = mock(LogEvent.class);
    when(logEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);

    String formattedIndexName = formatter.format(logEvent);
    Assert.assertEquals("index-2020-38", formattedIndexName);

    Long mondayAfterMidnight = LocalDateTime.of(2020, 9, 21, 0, 0, 1)
                                            .atZone(ZoneId.of("GMT"))
                                            .toInstant().toEpochMilli();

    when(builder.getInitTimeInMillis()).thenReturn(mondayAfterMidnight);
    LogEvent newWeekEvent = mock(LogEvent.class);
    when(newWeekEvent.getTimeMillis()).thenReturn(mondayAfterMidnight);

    String nextWeekIndex = formatter.format(newWeekEvent);
    Assert.assertEquals("index-2020-38", nextWeekIndex); // should have failed and produced index-2020-39

    formatter = builder.build(); // creating a new formatter instance

    nextWeekIndex = formatter.format(newWeekEvent);
    Assert.assertEquals("index-2020-39", nextWeekIndex);
  }
rfoltyns commented 4 years ago

Thank you @cuneytcalishkan!

Looks like a bug.

If you'd like to solve it, I'm more than happy to accept your help :+1: I haven't thought about any concrete solutions yet, but there are few things to have in mind:

cuneytcalishkan commented 4 years ago

Hello @rfoltyns,

So far I've managed to debug a little bit and understand what is going on behind the scenes of PatternProcessor that is created at initialization time, which actually calculates the next rollover time.

What I've found out is that, this depends on the system settings about the country/region where the first day of the week changes depending on the locale. e.g. United States, first day of the week is Sunday whereas for France it is Monday.

You can see in the below 2 unit tests which demonstrate this.

In order for returnsWeeklyRolloverIndexNameWithFrance to pass, you need to set the country setting of your OS to some location where the first day of the week is a Monday. With this setting, returnsWeeklyRolloverIndexNameWithUS will fail. If you change the country setting to United States, then returnsWeeklyRolloverIndexNameWithUS will pass and the other one will fail.

    @Test
    public void returnsWeeklyRolloverIndexNameWithFrance() {
      final String timezone = "Europe/Paris";
      long loggerInitMillis = LocalDateTime.of(2020, 10, 8, 21, 48, 35)
                                           .atZone(ZoneId.of(timezone))
                                           .toInstant().toEpochMilli();
      RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
      when(builder.getInitTimeInMillis()).thenReturn(loggerInitMillis);
      builder.withIndexName("index");
      builder.withPattern("yyyy-ww");
      builder.withSeparator("-");
      builder.withTimeZone(timezone);
      RollingIndexNameFormatter formatter = builder.build();

      long sundayBeforeMidnight = LocalDateTime.of(2020, 10, 11, 23, 59, 59)
                                               .atZone(ZoneId.of(timezone))
                                               .toInstant().toEpochMilli();
      LogEvent thisWeekEvent = mock(LogEvent.class);
      when(thisWeekEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
      String formattedIndexName = formatter.format(thisWeekEvent);
      Assert.assertEquals("index-2020-41", formattedIndexName);

      long nextWeekMondayAfterMidnight = LocalDateTime.of(2020, 10, 12, 0, 0, 1)
                                                      .atZone(ZoneId.of(timezone))
                                                      .toInstant().toEpochMilli();
      LogEvent nextWeekEvent = mock(LogEvent.class);
      when(nextWeekEvent.getTimeMillis()).thenReturn(nextWeekMondayAfterMidnight);
      String nextWeekIndex = formatter.format(nextWeekEvent);
      Assert.assertEquals("index-2020-42", nextWeekIndex);
    }

    @Test
    public void returnsWeeklyRolloverIndexNameWithUS() {
      final String timezone = "Europe/Paris";
      long loggerInitMillis = LocalDateTime.of(2020, 10, 8, 21, 48, 35)
                                           .atZone(ZoneId.of(timezone))
                                           .toInstant().toEpochMilli();
      RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
      when(builder.getInitTimeInMillis()).thenReturn(loggerInitMillis);
      builder.withIndexName("index");
      builder.withPattern("yyyy-ww");
      builder.withSeparator("-");
      builder.withTimeZone(timezone);
      RollingIndexNameFormatter formatter = builder.build();

      long sundayBeforeMidnight = LocalDateTime.of(2020, 10, 10, 23, 59, 59)
                                               .atZone(ZoneId.of(timezone))
                                               .toInstant().toEpochMilli();
      LogEvent thisWeekEvent = mock(LogEvent.class);
      when(thisWeekEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
      String formattedIndexName = formatter.format(thisWeekEvent);
      Assert.assertEquals("index-2020-41", formattedIndexName);

      long nextWeekMondayAfterMidnight = LocalDateTime.of(2020, 10, 11, 0, 0, 1)
                                                      .atZone(ZoneId.of(timezone))
                                                      .toInstant().toEpochMilli();
      LogEvent nextWeekEvent = mock(LogEvent.class);
      when(nextWeekEvent.getTimeMillis()).thenReturn(nextWeekMondayAfterMidnight);
      String nextWeekIndex = formatter.format(nextWeekEvent);
      Assert.assertEquals("index-2020-42", nextWeekIndex);
    }

I am currently checking with our operation team about this country/region setting of the machines. However, even though it is already week 42 for both US and Europe, we still have logs written with index week 41 as of now.

rfoltyns commented 4 years ago

Thank you @cuneytcalishkan. Great findings! :+1: I had a look at it as well and also found those Locale-based week ends. I've made a few tweaks during formatter initialization that gets your test to pass. Have a look here. I'll do some more testing today.

Also @vesselinn, YYYY-ww date pattern should be used in this case, not yyyy-ww. Otherwise you'll get incorrect behaviour at the end of the year, e.g.: 2020-01 instead of 2021-01 ~end of December 2020 (Java dates wizardry..). I'll add a note in the documentation.

It seems like the issue is caused by premature rollover. Initialising the formatter with nextRolloverTime = patternProcessor.getNextTime(...,0,..) was trying to force the rollover on first log. It was not catering for the case when first logs are arriving after factual rollover time causing it to rollover to far into the future.

Would you like to test the behaviour of 1.4.5-SNAPSHOT? I can push it out today. It should be available for download tomorrow.

cuneytcalishkan commented 4 years ago

@rfoltyns that would be great to test it with these changes if possible. Thank you very much.

rfoltyns commented 4 years ago

Done. 1.4.5-SNAPSHOT - 1.4 branch with f5d18c3 - pushed to http://oss.sonatype.org/content/repositories/snapshots.

Add the repo to your pom.xml (or other build system equivalent) to get it:

<repositories>
    <repository>
        <id>oss.sonatype.org-snapshot</id>
        <url>http://oss.sonatype.org/content/repositories/snapshots</url>
        <releases>
            <enabled>false</enabled>
        </releases>
        <snapshots>
            <enabled>true</enabled>
        </snapshots>
    </repository>
</repositories>
cuneytcalishkan commented 4 years ago

Hello @rfoltyns , I've checked the logs on our system today and unfortunately the solution didn't work. We still see last week's number in today's messages as 2020-42. The deployment was done on Friday and we would expect to see index with week number 43 on the logs of today's messages. We need to investigate a little bit more I guess. I will try to check again with your changes and different test scenarios.

rfoltyns commented 4 years ago

Ok. If you're getting just a few events per day, could you create a test that reproduces it with exact timestamps?

rfoltyns commented 4 years ago

@cuneytcalishkan I've uploaded another snapshot with isses/50 to http://oss.sonatype.org/content/repositories/snapshots.

This one looks a bit more promising. Rolling formatter test gets green once -Duser.timezone=GMT JVM arg is set.

user.timezone must match the timezone configured by the builder as PatternProcessor takes it from the env variables and FastDateFormat from the builder params - that's not great..

rfoltyns commented 4 years ago

@cuneytcalishkan Did you have a chance to test the latest snapshot?

cuneytcalishkan commented 4 years ago

Hello @rfoltyns, we actually deployed new version of the code with the latest snapshot and it seems to be even worse. Before, with a restart or a new deployment we would have the correct index. However, the latest snapshot takes the last week's number as the index. We had 2 deployments this week on Monday and Wednesday and the index is still pointing to week 44. I haven't had time to provide some unit tests to reproduce the issue, yet. I will try to do that as soon as possible.

rfoltyns commented 3 years ago

@cuneytcalishkan I got similar results in tests if -Duser.timezone=GMT was not set.