Open nit23uec opened 4 years ago
Hi @nit23uec, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?
Hi @jsvd , @colinsurprenant - can you please review this PR? Summary: This PR addresses the issue highlighted in https://github.com/logstash-plugins/logstash-output-file/issues/79 The strategy is basically to write continuous events to a temp file and then move the contents of temp file to the original gzip file on the flush interval. As this requires (at flush interval) - reading temp file, copying its contents to gzip file and then truncating the temp file, so the operation is non-atomic, but still it reduces the probability of getting a broken gzip by a considerable percentage.
Thanks.
Hi, any word on this? CLA is signed btw.
Hey there!
I recently stumbled across this issue. We have a relatively busy logstash server and I'm currently going through and calculating the exploded size of all of our outputted gzip files from this plugin, so I'll have some good stats on how frequently we run into it. So far it seems to be about 1%, but very well may happen specifically if Logstash is restarted.
Is there anything I can help with to get this merged, tested, and deployed?
@nit23uec Thanks for your contribution and sorry for the delay in following up. I looked at the PR and I have a few observations:
Zlib::GZipFile
object correctly and always be close
so that the footer is written. Given the above I am not sure we actually need the tmp fille writing strategy here and possibly only more-or-less make sure that close
is always called on the Zlib::GZipFile
object.
There is however a deeper problem at play which I also talked about in #79 which is that there is currently no way to safely consume a file produced by the file output while logstash is running because there is no way to know when a file is finished writing to and done with for good. This is a larger issue and is not only related to zipped files and the only way currently to safely consume a file from the file output is by shutting down logstash. Note that this problem might not be relevant for (text) file that are consumed in a tailing/streaming way but this is not applicable to zipped files that cannot be consumed as they are written to.
LMKWYT.
Hello! We've also stumbled upon corrupted gzip outputs. Is there any chance to get this Pull Request Merged?
Hi @makefu I'll take care of this
I think that this PR doesn't match the original fix suggestion defined in #73. This PR create a plain file, with temporary extension, and when it's flushed then the file is gzipped to its final position, while as I understand the original proposition was to work on a gzip temp file. When the file is going to be closed then move, which is an atomic operation respect to the filesystem, to it's final name, removing the temporary extension.
Right now however it seems like file corruption is somewhat worse than a workaround. It seems like for our stuff we will also have to shift to writing files in plain text and using logrotate to gzip the files periodically. Of course it would be much better to have a real solution for the issue. cheers!
I agree with you @makefu however this PR is more complicated than the suggestion that was originally proposed in #73. I would like to know if @nit23uec is still engaged in moving this forward or if I can takeover it, maybe in another PR.
Closes #79
Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/