Open enterprise-easyb opened 10 years ago
I noticed the same problem. I don't think your solution takes into account when the queue size is not an even multiple of the batch size or the alwaysBatch flag. Here is what I did to change close and it seems to work as long as Log4J is shutdown correctly before the application exits. In other words: LogManager.shutdown(); is called.
@Override public void close() { try { task.cancel(false); executor.shutdown(); boolean finished = executor.awaitTermination(waitTerminate, TimeUnit.MILLISECONDS); if (finished) { // We finished successfully, process any events in the queue if any still remain. if (!events.isEmpty()) run(); // Flush any remainder regardless of the alwaysBatch flag. if (messageIndex > 0) push(); } else { LogLog.warn("Executor did not complete in " + waitTerminate + " milliseconds. Log entries may be lost."); }
jedis.disconnect();
}
catch (Exception e)
{
errorHandler.error(e.getMessage(), e, ErrorCode.CLOSE_FAILURE);
}
}
I also added a "waitTerminate" property to the appender that defaults to 1000 milliseconds:
private long waitTerminate = 1000; . . . public void setWaitTerminate(long waitTerminate) { this.waitTerminate = waitTerminate; }
You can override the default value in the log4j.properties if you so desire.
There seems to be a race condition in the code that under certain circumstances causes RedisAppender to drop messages.
My test program is very simple:
import org.apache.log4j.Logger; import org.apache.log4j.LogManager; public class MyLogger { private static Logger logger = Logger.getLogger(MyLogger.class);
}
log4j.properties contains the following:
log4j.appender.redis.host=<>
log4j.appender.redis.port=6379
log4j.appender.redis.key=logstash
log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1
log4j.appender.redis.period=500
log4j.appender.redis.batchSize=100
log4j.appender.redis.purgeOnFailure=false
log4j.appender.redis.alwaysBatch=false
log4j.appender.redis.daemonThread=false
With this configuration, no log statements ever make it to redis. After reviewing the code, I believe the issue is caused because the close() method is being called before the ScheduledFuture task gets a chance to run. Since the close() method isn't checking the length of the events queue before cancelling the task and shutting down the executor, any pending messages are dropped.
Logging does seem to work reliably if period is greatly reduced. I tried a value of 1 ms.
I haven't done extensive testing, but I added:
to the close() method just before the call to task.cancel(false) and it appears that all log statements are getting logged even with period=10000. I haven't looked into all the nuances with batchSize and alwaysBatch, so I don't know how they impact this analysis. In one of my use cases, I have some very low volume logging and it could take 10 minutes or more to fill anything more than a trivial batchSize.