natefinch / lumberjack

lumberjack is a log rolling package for Go
MIT License
4.8k stars 591 forks source link

Option to rotate based on time instead of size #54

Open bradleypeabody opened 6 years ago

bradleypeabody commented 6 years ago

Right now it appears that MaxSize is what determines when rotation should be done. However a common way to rotate log files (e.g. with https://github.com/logrotate/logrotate) is by time ("daily","weekly", etc.).

One way to do this that would be quite flexible would be to add a Cron string which is parsed using something like https://godoc.org/github.com/robfig/cron - this way daily, weekly, hourly, etc. could all easily be implemented without special code. However, that adds a dependency where there is none right now.

Another approach would be to add a Pattern string which could be set to let's say "hourly" or "daily" - setting a timer in Go code specifically for the next hour or daily boundary is somewhat trivial, no external dependency needed. Then, later, if so desired, Pattern could be expanded to support a cron expression or other more complex rotation scheme, without breaking backward compatibility. I haven't looked too deeply into the code but I think this could be easy to add and would not break stuff for existing users.

Thoughts?

bradleypeabody commented 6 years ago

After I wrote the above I saw your notes here: https://github.com/natefinch/lumberjack/issues/17 I agree with what you're saying about non-deterministic log file sizes.

However, I think the basic point here is that while log rotation to prevent running out of disk space is one reason to rotate logs, it is not the only reason. Another reason would be to divide log files up into days so they can be analyzed like that.

Webserver access logs are a common example. Yes, you want to rotate them so you don't run out of space, but if you actually use the logs for analysis that's only a secondary consideration - the primary one is to get files broken into time units you can actually work with. If one day is six times larger than the next, that's just too bad, you still want it as one file. If it gets really bad, you'll probably switch to hourly rotation. This is a specific use case, but hopefully it makes it clear that non-deterministically sized log files can be a valid thing for some users.

I also saw what you're saying about the fact that it's only a few lines of code to force the rotate. That's true, but it means that one can't use the library purely with configuration. It's handy to be able to parse a portion of a config file directly into a Logger and use it as is - delegating configuration the user (administrator), rather than the developer having to come up with a separate solution for this scenario.

phuslu commented 6 years ago

Log Rotation based on time is not easy.

If we inteed to implement it correctly, we have to parse out timestamp from each log line. (yes, the cronly Rotate() calling is a wrong way. )

We must picking up log entry time in https://github.com/natefinch/lumberjack/blob/v2.0/lumberjack.go#L151 The cost is high and it's hard if we dont know the logger format.

BTW, I implement time based log rotation for glog. Because the glog line[1:5] is the month day, so the time comparison is cheap and fast. https://github.com/phuslu/glog/commit/56b7d3ebc32786a90255c9d0a0e3c8ea478923ba#diff-53260a0b4c68c1bdbb4ff791e55b2013R817

bradleypeabody commented 6 years ago

I would say that splitting the log files at exactly a given time is not necessarily a requirement, especially if it could be brittle due to the log line format possibly changing. Much of the time logs as written to by log.Printf etc are not machine-readable, but are used by humans for debugging, so having the log files by day is a simple way to "look at what happened on Wednesday" - the fact that something fell over by half a second from one of the adjacent days is not a deal breaker in a case like this. And I would argue that even in cases where the log files are being parsed, since you are using simple text files - meaning no transactions, no guarantees like you would get from a database - it's also an edge case that would be acceptable.

Of course it would be great if we could guarantee every record would end up in the right log file down to the nanosecond, but I don't know that if we can't it means this feature should not be provided. As above, I think the use cases which would make this an absolute requirement would also seem to indicate the developer should be using a database, not a text file, for whatever they are logging.