rsyslog / rsyslog

a Rocket-fast SYStem for LOG processing
http://www.rsyslog.com
GNU Lesser General Public License v3.0
2.05k stars 655 forks source link

Possible data loss when compression is enabled using ZipLevel #4530

Open saurav-sahu-exa opened 3 years ago

saurav-sahu-exa commented 3 years ago

Expected behavior

The hourly generated log.gz file for all hours in a day should be consistent without any possible case of input data being discarded or dropped. We expect the approx. size of log.gz around ~ 200 MB.

Actual behavior

The hourly generated log.gz file for some of the hours in a day is extremely small in size(in KB) as compared to the normal 200 MB size input file. Ex:

201 M 2020-01-01T01.log.gz
210 M 2020-01-01T02.log.gz
205 M 2020-01-01T03.log.gz
50 K 2020-01-01T04.log.gz     <<<< Small in size  
195 M 2020-01-01T05.log.gz
205 M 2020-01-01T05.log.gz

When we turn off the compression by removing ZipLevel=1, the houry log.gz are consistently of size ~200 MB.

Steps to reproduce the behavior

  1. Enable the compression of zipLevel 1 as shown below.
  2. Restart the rsyslog on host machine.
  3. Start syslog data feed.
  4. Observe the data files in input directory for several hours.

Environment

Observation:

We don't observe any data loss when compression is not enabled. Looking into the documentation I wonder if there is any flag/parameter that can possibly prevent such data loss, and still be as efficient in compressing the data. Any one of them: veryRobustZip, ioBufferSize, flushOnTXEnd, or asyncWriting?

davidelang commented 3 years ago

the version of rsyslog that you are running is 4 or so years old, with some unknown number of redhat created patches added to it (to backport what they consider 'critical' patches) As such, it is very hard for the community to diagnose anything with it.

Could you please update to a current community release and see if you still have the same problem? I am not aware of any specific fixes that would address this, but there have just been so many patches over the years that we need to rule it out (and get you onto something current so that if we find there is something to fix, the delta between what you are running and the fixed version we ask you to test is small)

David Lang

On Sun, 14 Feb 2021, Saurav Sahu wrote:

Date: Sun, 14 Feb 2021 22:33:18 -0800 From: Saurav Sahu notifications@github.com Reply-To: rsyslog/rsyslog reply@reply.github.com To: rsyslog/rsyslog rsyslog@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: [rsyslog/rsyslog] Possible data loss when compression is enabled using ZipLevel (#4530)

Expected behavior

The hourly generated log.gz file for all hours in a day should be consistent without any possible case of input data being discarded or dropped. We expect the approx. size of log.gz around ~ 200 MB.

Actual behavior

The hourly generated log.gz file for some of the hours in a day is extremely small in size(in KB) as compared to the normal 200 MB size input file. Ex:

201 M 2020-01-01T01.log.gz
210 M 2020-01-01T02.log.gz
205 M 2020-01-01T03.log.gz
50 K 2020-01-01T04.log.gz
195 M 2020-01-01T05.log.gz
205 M 2020-01-01T05.log.gz

When we turn off the compression by removing ZipLevel=1, the houry log.gz are consistently of size ~200 MB.

Steps to reproduce the behavior

  1. Enable the compression of zipLevel 1 as shown below.
  2. Restart the rsyslog on host machine.
  3. Start syslog data feed.
  4. Observe the data files in input directory for several hours.

Environment

  • rsyslog version: 8.24.0-57.el7_9
  • platform: x86_64 GNU/Linux
  • for configuration questions/issues, include rsyslog.conf and included config files
    template(name="HourlyLogs" type="string"
        string="/opt/newuser/data/%$YEAR%-%$MONTH%-%$DAY%T%$HOUR%.log.gz")
    if ($fromhost-ip != "127.0.0.1" and $fromhost-ip != "::1") then
    action(type="omfile" ZipLevel=1 template="SyslogFormat" dynaFile="HourlyLogs"
        fileOwner="newuser" fileGroup="newuser" dirOwner="newuser" dirGroup="newuser"
        fileCreateMode="0666" dirCreateMode="0777")

Observation:

We don't observe any data loss when compression is not enabled. Looking into the documentation I wonder if there is any flag/parameter that can possibly prevent such data loss, and still be as efficient in compressing the data. Any one of them: veryRobustZip, ioBufferSize, flushOnTXEnd, or asyncWriting?