Open vesselinn opened 4 years ago
Hi @vesselinn. I can't reproduce it. RollingIndexName
(actually RollingIndexNameFormatter
class) takes the time as-is from LogEvent.getTimeMillis()
. I've just unit tested the weekly rollover. 2 events everyday: 1 daily, second one with random time every day just to try to 'confuse' the formatter, from January 2020, 3 years ahead. It rolled over exactly every 7 days, calculating index name correctly every day. Could you provide me with a test that can reproduce this behavior?
Are the logs in your indices have correct timeMillis
values? Given that you can create an index pattern per index, you should be able to verify that documents are indexed for more than a week in a single index.
Hi @rfoltyns. Let me give you examples from our kibana:
Date: 2020-06-22 14:59:07.986 timeMillis: 1592827147987
Date: 2020-07-04 23:59:13.271 timeMillis: 1593896353271
Date: 2020-07-05 00:00:13.572 timeMillis: 1593896413572
Thanks
How consistently does that happen? Does it happen right after the deployment or after few weeks of JVM running? How many documents do you send to ES per day? Do you send logs immediately after they're produced or do you replay/re-process them? Do you have any post processing defined in ES? Pipelines? Do you have any index management defined in ES? Can you reproduce this in a test?
Hi, When we noticed the problem we left one server (JVM) to run without restart for 3 weeks (you can see the result). On this picture you can see how many logs we send to ELK. Actually we use java util logging to produce logs, but they are handled and transformed to log4j2 logs and send to ELK via elasticsearch asynch appender (shown in my first post). We don't have mechanism to re-process them. I've also unit tested the behavior of weekly rollover and it was running normally. Could it be a problem with WebLogic server as all our apps and the Handler are deployed on it? When we run WebLogic all log indexes are in correct format and week of the year, but they are not changing on time - like Sunday 00:00 or Monday 00:00. We don't have any simultaneous scenario for changing the index. Also we don't have post processing and pipelines defined in ES, only mapping template for types of the data.
I've also found log4j2 ticket mentioned in org.apache.logging.log4j.core.appender.rolling.PatternProcessor.getNextTime() / log4j-core-2.11.1jar. Could it be something related to the issue?
Given a fairly constant daily rate, sth apparently went wrong on 22nd June. Could send me the output of this query?
curl http://somehost:9200/_cat/indices/log4j.2020-*?v=&s=index
Ticket you mentioned describes issues on year change, but maybe it's worth looking into it.
BTW, If you're transforming the logs from java logging, you can actually use AsyncBatchDelivery
directly, this way you'll have full control of the target index name (and maybe discover an issue with it a bit easier). Also, it might flesh out some timestamp issues.
Hello @rfoltyns ,
We are also using this project and we've come across the same problem.
Here is a unit test to reproduce the issue and I guess I have found why the problem happens. The nextRolloverTime is calculated at construction time of RollingIndexNameFormatter and if this is the only instance that is used for each message before logging, we end up with this error. If we create a new instance each time, then the index is calculated correctly. Please take a look at the unit test code below and let me know if I could help to solve this issue or if this test makes sense at all.
@Test
public void testWeeklyRollingIndexPattern() {
Long sundayBeforeMidnight = LocalDateTime.of(2020, 9, 20, 23, 59, 59)
.atZone(ZoneId.of("GMT"))
.toInstant().toEpochMilli();
RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
when(builder.getInitTimeInMillis()).thenReturn(sundayBeforeMidnight);
builder.withIndexName("index");
builder.withPattern("yyyy-ww");
builder.withSeparator("-");
builder.withTimeZone("GMT");
RollingIndexNameFormatter formatter = builder.build();
LogEvent logEvent = mock(LogEvent.class);
when(logEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
String formattedIndexName = formatter.format(logEvent);
Assert.assertEquals("index-2020-38", formattedIndexName);
Long mondayAfterMidnight = LocalDateTime.of(2020, 9, 21, 0, 0, 1)
.atZone(ZoneId.of("GMT"))
.toInstant().toEpochMilli();
when(builder.getInitTimeInMillis()).thenReturn(mondayAfterMidnight);
LogEvent newWeekEvent = mock(LogEvent.class);
when(newWeekEvent.getTimeMillis()).thenReturn(mondayAfterMidnight);
String nextWeekIndex = formatter.format(newWeekEvent);
Assert.assertEquals("index-2020-38", nextWeekIndex); // should have failed and produced index-2020-39
formatter = builder.build(); // creating a new formatter instance
nextWeekIndex = formatter.format(newWeekEvent);
Assert.assertEquals("index-2020-39", nextWeekIndex);
}
Thank you @cuneytcalishkan!
Looks like a bug.
If you'd like to solve it, I'm more than happy to accept your help :+1: I haven't thought about any concrete solutions yet, but there are few things to have in mind:
current*
and next*
fields are there to ensure that in a typical scenario, calculations and allocations are made only "around" rollover timeRollingIndexNameFormatter:116
)Hello @rfoltyns,
So far I've managed to debug a little bit and understand what is going on behind the scenes of PatternProcessor that is created at initialization time, which actually calculates the next rollover time.
What I've found out is that, this depends on the system settings about the country/region where the first day of the week changes depending on the locale. e.g. United States, first day of the week is Sunday whereas for France it is Monday.
You can see in the below 2 unit tests which demonstrate this.
In order for returnsWeeklyRolloverIndexNameWithFrance
to pass, you need to set the country setting of your OS to some location where the first day of the week is a Monday. With this setting, returnsWeeklyRolloverIndexNameWithUS
will fail.
If you change the country setting to United States, then returnsWeeklyRolloverIndexNameWithUS
will pass and the other one will fail.
@Test
public void returnsWeeklyRolloverIndexNameWithFrance() {
final String timezone = "Europe/Paris";
long loggerInitMillis = LocalDateTime.of(2020, 10, 8, 21, 48, 35)
.atZone(ZoneId.of(timezone))
.toInstant().toEpochMilli();
RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
when(builder.getInitTimeInMillis()).thenReturn(loggerInitMillis);
builder.withIndexName("index");
builder.withPattern("yyyy-ww");
builder.withSeparator("-");
builder.withTimeZone(timezone);
RollingIndexNameFormatter formatter = builder.build();
long sundayBeforeMidnight = LocalDateTime.of(2020, 10, 11, 23, 59, 59)
.atZone(ZoneId.of(timezone))
.toInstant().toEpochMilli();
LogEvent thisWeekEvent = mock(LogEvent.class);
when(thisWeekEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
String formattedIndexName = formatter.format(thisWeekEvent);
Assert.assertEquals("index-2020-41", formattedIndexName);
long nextWeekMondayAfterMidnight = LocalDateTime.of(2020, 10, 12, 0, 0, 1)
.atZone(ZoneId.of(timezone))
.toInstant().toEpochMilli();
LogEvent nextWeekEvent = mock(LogEvent.class);
when(nextWeekEvent.getTimeMillis()).thenReturn(nextWeekMondayAfterMidnight);
String nextWeekIndex = formatter.format(nextWeekEvent);
Assert.assertEquals("index-2020-42", nextWeekIndex);
}
@Test
public void returnsWeeklyRolloverIndexNameWithUS() {
final String timezone = "Europe/Paris";
long loggerInitMillis = LocalDateTime.of(2020, 10, 8, 21, 48, 35)
.atZone(ZoneId.of(timezone))
.toInstant().toEpochMilli();
RollingIndexNameFormatter.Builder builder = spy(RollingIndexNameFormatter.newBuilder());
when(builder.getInitTimeInMillis()).thenReturn(loggerInitMillis);
builder.withIndexName("index");
builder.withPattern("yyyy-ww");
builder.withSeparator("-");
builder.withTimeZone(timezone);
RollingIndexNameFormatter formatter = builder.build();
long sundayBeforeMidnight = LocalDateTime.of(2020, 10, 10, 23, 59, 59)
.atZone(ZoneId.of(timezone))
.toInstant().toEpochMilli();
LogEvent thisWeekEvent = mock(LogEvent.class);
when(thisWeekEvent.getTimeMillis()).thenReturn(sundayBeforeMidnight);
String formattedIndexName = formatter.format(thisWeekEvent);
Assert.assertEquals("index-2020-41", formattedIndexName);
long nextWeekMondayAfterMidnight = LocalDateTime.of(2020, 10, 11, 0, 0, 1)
.atZone(ZoneId.of(timezone))
.toInstant().toEpochMilli();
LogEvent nextWeekEvent = mock(LogEvent.class);
when(nextWeekEvent.getTimeMillis()).thenReturn(nextWeekMondayAfterMidnight);
String nextWeekIndex = formatter.format(nextWeekEvent);
Assert.assertEquals("index-2020-42", nextWeekIndex);
}
I am currently checking with our operation team about this country/region setting of the machines. However, even though it is already week 42 for both US and Europe, we still have logs written with index week 41 as of now.
Thank you @cuneytcalishkan. Great findings! :+1: I had a look at it as well and also found those Locale-based week ends. I've made a few tweaks during formatter initialization that gets your test to pass. Have a look here. I'll do some more testing today.
Also @vesselinn, YYYY-ww
date pattern should be used in this case, not yyyy-ww
. Otherwise you'll get incorrect behaviour at the end of the year, e.g.: 2020-01 instead of 2021-01 ~end of December 2020 (Java dates wizardry..). I'll add a note in the documentation.
It seems like the issue is caused by premature rollover. Initialising the formatter with nextRolloverTime = patternProcessor.getNextTime(...,0,..)
was trying to force the rollover on first log. It was not catering for the case when first logs are arriving after factual rollover time causing it to rollover to far into the future.
Would you like to test the behaviour of 1.4.5-SNAPSHOT
? I can push it out today. It should be available for download tomorrow.
@rfoltyns that would be great to test it with these changes if possible. Thank you very much.
Done. 1.4.5-SNAPSHOT
- 1.4 branch with f5d18c3 - pushed to http://oss.sonatype.org/content/repositories/snapshots
.
Add the repo to your pom.xml
(or other build system equivalent) to get it:
<repositories>
<repository>
<id>oss.sonatype.org-snapshot</id>
<url>http://oss.sonatype.org/content/repositories/snapshots</url>
<releases>
<enabled>false</enabled>
</releases>
<snapshots>
<enabled>true</enabled>
</snapshots>
</repository>
</repositories>
Hello @rfoltyns , I've checked the logs on our system today and unfortunately the solution didn't work. We still see last week's number in today's messages as 2020-42. The deployment was done on Friday and we would expect to see index with week number 43 on the logs of today's messages. We need to investigate a little bit more I guess. I will try to check again with your changes and different test scenarios.
Ok. If you're getting just a few events per day, could you create a test that reproduces it with exact timestamps?
@cuneytcalishkan I've uploaded another snapshot with isses/50 to http://oss.sonatype.org/content/repositories/snapshots
.
This one looks a bit more promising. Rolling formatter test gets green once -Duser.timezone=GMT
JVM arg is set.
user.timezone
must match the timezone configured by the builder as PatternProcessor
takes it from the env variables and FastDateFormat
from the builder params - that's not great..
@cuneytcalishkan Did you have a chance to test the latest snapshot?
Hello @rfoltyns, we actually deployed new version of the code with the latest snapshot and it seems to be even worse. Before, with a restart or a new deployment we would have the correct index. However, the latest snapshot takes the last week's number as the index. We had 2 deployments this week on Monday and Wednesday and the index is still pointing to week 44. I haven't had time to provide some unit tests to reproduce the issue, yet. I will try to do that as soon as possible.
Description Hello,
For couple of months we are using your app to send log events to ELK, but for some reason it is not changing weekly index. What we see actually is that the index stay for more than 10 days and when it change the index it neither Sunday nor Monday.
Configuration `<?xml version="1.0" encoding="UTF-8"?>
`
Runtime (please complete the following information):
Additional context Trying with daily or hourly pattern it is ok.