jeffaco / duplicacy-util

Utility to schedule and run duplicacy backup via the command line
Apache License 2.0
94 stars 24 forks source link

log rotation renaming #26

Closed Ossssip closed 6 years ago

Ossssip commented 6 years ago

A small issue: seems like currently the log rotator first renames the existing config_name.log to config_name.log.1.gz and then puts it into archive. I have 10 archived logs de_tools.log.1.gz, de_tools.log.2.gz, ..., de_tools.log.10.gz. Inside every archive, there is always a file named 'de_tools.log.1.gz' which actually is not an archive but just a plaint text log file.

jeffaco commented 6 years ago

I'm not sure what you mean here. A .gz file isn't an archive, as such, it's a compressed file, compressed using the GZIP protocol (a common compression protocol).

On my system, in the log directory, I have the following:

$ ls -l
total 242048
-rw-r--r--  1 jeff  staff    109688 Oct  8 12:05 office.log
-rw-r--r--  1 jeff  staff     31845 Oct  8 00:05 office.log.1.gz
-rw-r--r--  1 jeff  staff     33756 Oct  7 12:05 office.log.2.gz
-rw-r--r--  1 jeff  staff     33734 Oct  7 00:05 office.log.3.gz
-rw-r--r--  1 jeff  staff     31278 Oct  6 12:05 office.log.4.gz
-rw-r--r--  1 jeff  staff     39172 Oct  8 12:00 quicken.log
-rw-r--r--  1 jeff  staff      6905 Oct  8 00:00 quicken.log.1.gz
-rw-r--r--  1 jeff  staff       340 Oct  7 22:53 quicken.log.2.gz
-rw-r--r--  1 jeff  staff       509 Oct  7 22:53 quicken.log.3.gz
-rw-r--r--  1 jeff  staff       423 Oct  7 22:52 quicken.log.4.gz
-rw-r--r--  1 jeff  staff  53784861 Oct  8 00:28 taltos.log
-rw-r--r--  1 jeff  staff  17144503 Oct  7 04:39 taltos.log.1.gz
-rw-r--r--  1 jeff  staff  17155276 Oct  6 02:21 taltos.log.2.gz
-rw-r--r--  1 jeff  staff  16648911 Oct  5 00:35 taltos.log.3.gz
-rw-r--r--  1 jeff  staff  16552894 Oct  4 00:29 taltos.log.4.gz
$

Now, the regular files (with a plain .log extension) are just uncompressed text. The other files are all compressed text, and to see them, you would need to decompress.

Here's some examples:

$ file quicken*
quicken.log:      ASCII text
quicken.log.1.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 37875
quicken.log.2.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 1070
quicken.log.3.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 1527
quicken.log.4.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 1313
$ 

This shows what is compressed and what is not compressed. Furthermore:

$ less quicken.log.1.gz 
"quicken.log.1.gz" may be a binary file.  See it anyway? 
$ head quicken.log
12:00:05 Beginning backup on 10-08-2018 12:00:05
12:00:05 ######################################################################
12:00:05 Backing up to storage b2 with 10 threads
12:00:05 Storage set to b2://xxxxxxxxxx
12:00:10 Last backup at revision 358 found
12:00:10 Indexing /Volumes/Quicken
12:00:10 Loaded 7 include/exclude pattern(s)
12:00:10 Use 10 uploading threads
12:00:11 Uploaded chunk 2 size 4509042, 4.30MB/s 00:00:02 43.4%
12:00:11 Uploaded chunk 1 size 5321559, 9.38MB/s 00:00:01 94.6%
$ 

This is exactly what I expect. The most recent log is uncompressed so you can see it easily. For the other logs, you need a program to allow you to decompress it and view it. But it is no container, as you can see:

$ gunzip -d -c quicken.log.1.gz | head
00:00:00 Beginning backup on 10-08-2018 00:00:00
00:00:00 ######################################################################
00:00:00 Backing up to storage b2 with 10 threads
00:00:00 Storage set to b2://xxxxxxxxxx
00:00:03 Last backup at revision 357 found
00:00:03 Indexing /Volumes/Quicken
00:00:03 Loaded 7 include/exclude pattern(s)
00:00:03 Use 10 uploading threads
00:00:04 Uploaded chunk 4 size 4894859, 4.67MB/s 00:00:02 37.5%
00:00:04 Uploaded chunk 3 size 3402096, 7.91MB/s 00:00:01 63.6%
$ 

As you can see, this is just a plain text file. It's not a "container" (a container, like a .zip file or a .tar file would have other files within it).

I suspect that you're confused due to unfamiliarity with the .gz file format, or because of tooling on your system that doesn't include the gzip program. Such software is available for Windows (assuming that's what you're running), and even built in if you run the recently released Ubuntu subsystem under Windows.

Hope this clarifies. Please close this issue if you're clear, or feel free to ask further questions.

Ossssip commented 6 years ago

Sorry for wrong terminology regarding archives/compresed files. In my understanding, after decompression I should get an original file, right? Let say I have a file sample.txt:

/temp $ ll
-rw-rw-rw- 1 user group 95493 Oct  5 15:24 sample.txt
/temp $ file sample.txt 
sample.txt: ISO-8859 English text

I apply compression to it:

/temp $ gzip sample.txt 

filenow tells me, that before compression that file was named sample.txt:

/temp $ ll
-rw-rw-rw- 1 user group 20020 Oct  5 15:24 sample.txt.gz
/temp $ file sample.txt.gz 
sample.txt.gz: gzip compressed data, was "sample.txt", from Unix, last modified: Fri Oct  5 15:24:10 2018

If I uncompress it, I will obtain the original file:

/temp $ gzip -d sample.txt.gz 
/temp $ ll
total 94
-rw-rw-rw- 1 user group 95493 Oct  5 15:24 sample.txt
/temp $ file sample.txt 
sample.txt: ISO-8859 English text

My expectations were the same for duplicacy-util logs: If I uncompress a compressed log, I would get an original log file, e.g.,de_tools.log.

Your own example shows that even before compression, the log filename was already appended with .1.gz:

quicken.log.3.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz"

After running gzip -d quicken.log.3.gz, you will get a plain text file still named as if it was a compressed one: quicken.log.1.gz. It is not an issue for linux users, as less/cat/whatever else command does not pay attention to the file extension and just shows the file contents.

On Windows, however, this causes confusion. I do not have a viewer which understands gzipped files out of the box, so I have to decompress a log first. In real life I use gui-based tools, but I will illustrate it here with the command-line version of 7zip archiver:

temp> 7z.exe l de_tools.log.5.gz
7-Zip 17.01 beta (x64) : Copyright (c) 1999-2017 Igor Pavlov : 2017-08-28
Scanning the drive for archives:
1 file, 3395 bytes (4 KiB)
Listing archive: de_tools.log.5.gz
--
Path = de_tools.log.5.gz
Type = gzip
Headers Size = 67
   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
                    .....        54056         3395  D:\Tools\duplicacy\duplicacy-util\logs\de_tools.log.1.gz
------------------- ----- ------------ ------------  ------------------------
                                 54056         3395  1 files

I decompress it:

temp> 7z.exe e de_tools.log.5.gz

Now I have file named de_tools.log.1.gz which is already a decompressed plain text, but its extension still tells me that it is a compressed file. At thif point I think it is a compressed file, I try to decompress it, but get an error:

temp> 7z.exe e de_tools.log.1.gz
7-Zip 17.01 beta (x64) : Copyright (c) 1999-2017 Igor Pavlov : 2017-08-28
Scanning the drive for archives:
1 file, 54056 bytes (53 KiB)
Extracting archive: de_tools.log.1.gz
Can't open as archive: 1
Files: 0
Size:       0
Compressed: 0

So I have impression that the log rotation algorithm makes an unnesessaty step first renaming the *.log file to *.log.1.gz and then compressing it to the file with the same name. Is it so? And is it nessesary?

jeffaco commented 6 years ago

I understand the issue. I was working off of assumptions of how gzip works on Linux, Mac, UNIX, just about every platform under the sun. That is, the .gz extension is a hint that the file is GZIP compressed, but not mandatory. In particular:

$ file quicken.log.1.gz 
quicken.log.1.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 5818
$ gunzip quicken.log.1.gz 
$ file quicken.log.1 
quicken.log.1: ASCII text
$

Here you can see that, by convention (unless told to decompress to stdout or something), gzip removed the .gz extension.

Now, looking at the code, I do not actually rename the file before compressing. I open the original file for reading, open the new file (with the .1.gz extension added on) to the new file, and the compress to it. But I also need to deal with the header, and I think the problem is that I told the header the original name was <name>.1.gz, which is wrong. This is indicated on Mac, but I never noticed, as behavior was unaffected.

I'll look at this.

jeffaco commented 6 years ago

It was the header, note below:

$ file ~/.duplicacy-util/log/quicken.log*
/Users/jeff/.duplicacy-util/log/quicken.log:      ASCII text
/Users/jeff/.duplicacy-util/log/quicken.log.1.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log", original size 37352
/Users/jeff/.duplicacy-util/log/quicken.log.2.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 5818
/Users/jeff/.duplicacy-util/log/quicken.log.3.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 39172
/Users/jeff/.duplicacy-util/log/quicken.log.4.gz: gzip compressed data, was "/Users/jeff/.duplicacy-util/log/quicken.log.1.gz", original size 37875
$

If you note file /Users/jeff/.duplicacy-util/log/quicken.log.1.gz, you'll see that the original name was quicken.log, and not quicken.log.1.gz.

This should be transparent on my platforms, and strictly speaking, the original name was not already compressed. So this fix is correct. I'l commit this to master shortly.