ryantenney / log4j-redis-appender

log4j appender writing straight to Redis
Other
33 stars 26 forks source link

Appender can lose messages #2

Open enterprise-easyb opened 10 years ago

enterprise-easyb commented 10 years ago

There seems to be a race condition in the code that under certain circumstances causes RedisAppender to drop messages.

My test program is very simple:

import org.apache.log4j.Logger; import org.apache.log4j.LogManager; public class MyLogger { private static Logger logger = Logger.getLogger(MyLogger.class);

static public void main(String[] args) {
   try {
      if (args[0].length() == 0) {
         throw new Exception("You must pass the message to log");
      } else {
         for (int i=0; i < 50; i++) {
            logger.debug(args[0]+" -- "+i);
         }
      } 
   }
   catch(Exception e) {
      logger.debug(e.getMessage(), e);
   }
   finally {
      LogManager.shutdown();
   }
}

}

log4j.properties contains the following:

log4j.appender.redis.host=<> log4j.appender.redis.port=6379 log4j.appender.redis.key=logstash log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1 log4j.appender.redis.period=500 log4j.appender.redis.batchSize=100 log4j.appender.redis.purgeOnFailure=false log4j.appender.redis.alwaysBatch=false log4j.appender.redis.daemonThread=false

With this configuration, no log statements ever make it to redis. After reviewing the code, I believe the issue is caused because the close() method is being called before the ScheduledFuture task gets a chance to run. Since the close() method isn't checking the length of the events queue before cancelling the task and shutting down the executor, any pending messages are dropped.

Logging does seem to work reliably if period is greatly reduced. I tried a value of 1 ms.

I haven't done extensive testing, but I added:

                    if (events.size() > 0) {
                       executor.execute(this);
                    }

to the close() method just before the call to task.cancel(false) and it appears that all log statements are getting logged even with period=10000. I haven't looked into all the nuances with batchSize and alwaysBatch, so I don't know how they impact this analysis. In one of my use cases, I have some very low volume logging and it could take 10 minutes or more to fill anything more than a trivial batchSize.

stolidedog commented 9 years ago

I noticed the same problem. I don't think your solution takes into account when the queue size is not an even multiple of the batch size or the alwaysBatch flag. Here is what I did to change close and it seems to work as long as Log4J is shutdown correctly before the application exits. In other words: LogManager.shutdown(); is called.

@Override public void close() { try { task.cancel(false); executor.shutdown(); boolean finished = executor.awaitTermination(waitTerminate, TimeUnit.MILLISECONDS); if (finished) { // We finished successfully, process any events in the queue if any still remain. if (!events.isEmpty()) run(); // Flush any remainder regardless of the alwaysBatch flag. if (messageIndex > 0) push(); } else { LogLog.warn("Executor did not complete in " + waitTerminate + " milliseconds. Log entries may be lost."); }

    jedis.disconnect();
}
catch (Exception e)
{
    errorHandler.error(e.getMessage(), e, ErrorCode.CLOSE_FAILURE);
}

}

I also added a "waitTerminate" property to the appender that defaults to 1000 milliseconds:

private long waitTerminate = 1000; . . . public void setWaitTerminate(long waitTerminate) { this.waitTerminate = waitTerminate; }

You can override the default value in the log4j.properties if you so desire.