Adds gzip compression to backup log files

donovansolms commented 8 years ago

This PR adds gzip compression to backup log files via the CompressBackups config option. Tests added and passed. Please let me know if I missed anything. Note: We're using this library in production (including gzipped backups) with max size as 100MB. Compression doesn't have any noticeable performance impact aside from saving a lot of drive space.

donovansolms commented 8 years ago

I see the failed test on AppVeyor. I'll investigate the "The process cannot access the file because it is being used by another process" error and resolve.

natefinch commented 8 years ago

Thats super common if you have code that tries to delete or rename a file before closing it on windows. It works on Linux but not windows.

On Tue, Oct 27, 2015, 10:40 AM Donovan Solms notifications@github.com wrote:

I see the failed test on AppVeyor. I'll investigate the "The process cannot access the file because it is being used by another process" error and resolve.

— Reply to this email directly or view it on GitHub https://github.com/natefinch/lumberjack/pull/16#issuecomment-151522282.

donovansolms commented 8 years ago

I've fixed the issue on Windows. The original file was being close with a defer but renamed before the defer could execute.

Let me know if you would like more tests for this feature, I'm happy to add them.

donovansolms commented 8 years ago

I've fixed the large allocation issue, thanks for pointing that out. I also updated the code style to fit the rest with the single line form if. Finally I updated the test to verify that the compressed file actually contains the content originally written to it.

Please let me know if I misunderstood anything.

s7v7nislands commented 7 years ago

ping, will this merged?

natefinch commented 7 years ago

I'll take a look at this today. Sorry for the delay..

newhook commented 7 years ago

@donovansolms Are you planning on addressing Nate's comments? We need this code for our production systems, so if you are not going to be able to address them in short order I can take a look.

donovansolms commented 7 years ago

Hi, yes @newhook, apologies, will get to this over the weekend.

donovansolms commented 7 years ago

@natefinch Thanks for the review. I've made the style changes and added the robustness.

I moved the compression of logs to cleanup which is triggered on rotate. It now checks for any logs that have not been compressed and compresses them in a separate goroutine.

When a log file is missed due to the application going down while compressing, it will be compressed on the next rotate. For my use case that is acceptable, what is your opinion?

natefinch commented 7 years ago

I don't think it's ok to wait until the next rotate to compress the file. The problem is that instead of a nicely compressed backup (usually, what, like 10-20% of uncompressed size?), you'll have the full size log file there... which if you need that space, could screw you up.

And actually, this brings up an interesting point about compressed backups... it changes the math you need to do for how much space you're using. Before, 100MB max size with 3 backups meant the max you'd use is 400MB of disk space (main log + 3 backups)... but now that same config would be generally far less... but's it's nondeterministic just how much space you'd be using, since it would depend on the compression ratio.

I'm not a huge fan of nondeterministic results, but I'm not exactly sure how to fix that problem, so that people can know the max amount of disk space their configuration will use. Any suggestions on a nice user experience for this that doesn't totally break backwards compatibility would be good.

I'll review the actual changes probably tomorrow, maybe tonight if I have more time than I expect.

donovansolms commented 7 years ago

No problem. I'll add compressing the missed file(s) on startup (probably around openExistingOrNew) as well in the morning. I'll need a lock to ensure different routines don't attempt compressing the same file - if a log is rotated before the startup log compression completes.

I get what you are saying about calculating the max amount of space the configuration will use, I also don't have a clear answer to that at the moment.

newhook commented 7 years ago

I'm not sure I see a huge need to compress immediately especially if it makes the code more complex and potentially brittle. If you change the logging parameters (unless this has changed) log files are not cleaned up until rotation, so this seems to be a similar situation.

donovansolms commented 7 years ago

The change sounded more complicated than it turned out to be. After the log file is created or opened on startup, compressLogs is executed in a new goroutine. It is protected by the cmu mutex to avoid multiple routines attempting to compress the same files. This allows that compression not block the application on startup.

newhook commented 7 years ago

Is there something more to do for this PR?

natefinch commented 7 years ago

I'll take a look this weekend. Sorry, been slammed with other things lately.

newhook commented 7 years ago

I've merged this and have been using it production for a few days now. Seems to work! I think I'd prefer though if Close would block until any outstanding compression is complete.

donovansolms commented 7 years ago

Thanks for the update @natefinch. We're really busy in December, so I might not get to this before the end of the year. If you say Canonical might want it, is it not worth rather going with the suggestions in Issue #13 (Compressing rolled files) so as to not lock this to gzip?

natefinch commented 7 years ago

That's ok, I may tackle this myself next week.

howbazaar commented 7 years ago

ping... any progress?

natefinch commented 7 years ago

I'll take a look in the morning.

On Wed, Apr 5, 2017, 12:14 AM Tim Penhey notifications@github.com wrote:

ping... any progress?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/natefinch/lumberjack/pull/16#issuecomment-291749350, or mute the thread https://github.com/notifications/unsubscribe-auth/ADCcyHSgfdsrY5gfTsSG-MazkU6wzG7mks5rsxUVgaJpZM4GWhTf .

natefinch commented 7 years ago

It was late when I was looking at this, so I forgot where we were. I haven't really had the time to spend on this. I think this PR could be a starting off point, but it still needs a lot of thought. there's basically three things you have to think about:

The compression should happen on a background goroutine, to make sure that we don't block writes to the main log file for any longer than necessary.
The code currently looks for *.log files to determine how many backup files already exist on disk, so we need to modify that to ensure that it checks for compressed logs as well.
The code has to be smart enough to handle getting killed in the middle of compression. Thus, it'll need to look for uncompressed files on startup, as well as clean up any temporary files that were created while compressing.
We have to compress to a temporary file and then rename that file to the final name when it's finished, and then remove the uncompressed file.

howbazaar commented 7 years ago

How would you feel about juju devs picking this up?

natefinch commented 7 years ago

Please feel free. I'll give reviews high priority.

natefinch commented 7 years ago

This is implemented in #43

natefinch / lumberjack

Adds gzip compression to backup log files #16