k0lter / autopostgresqlbackup

Automated tool to make periodic backups of PostgreSQL databases
52 stars 17 forks source link

Make autopostgresqlbackup a bit more portable #27

Open dennisse opened 1 year ago

dennisse commented 1 year ago

GNU stat is not very portable. The less obvious, more portable and posix-friendly way to get the bytecount of a file is to use wc(1) - wc -c < file. This will work across all the Linuxes/unixes/BSDs.

hostname is also not very portable. Especially not with the --fqdn-flag. In fact, the man-page of hostname(1) kinda warns against using it.

The POSIX-friendly way of getting the hostname is uname -n, which will return the "node name". I believe uname -n uses the same function (gethostname(2)) hostname(1) uses. FQDN can be set either according to hostname(1), or by hostnamectl(1) on systemd systems.

dennisse commented 11 months ago

Is there anything I can to make this PR more acceptable?

k0lter commented 5 months ago

@dennisse I've applied your portability improvements directly (see: https://github.com/k0lter/autopostgresqlbackup/commit/f341e5afce747bc97143641fb5a32631a6cb3c85). The only one I've discarded is the dump size calculation using wc which is in my opinion counter performant (with huge databases, It could take time to read the whole file). About stat portability, we have to find a replacement.

dennisse commented 5 months ago

https://github.com/k0lter/autopostgresqlbackup/commit/f341e5afce747bc97143641fb5a32631a6cb3c85 🥳

The only one I've discarded is the dump size calculation using wc which is in my opinion counter performant (with huge databases, It could take time to read the whole file). About stat portability, we have to find a replacement.

How about using just ls? From the posix specification:

The <size> field shall contain the value that would be returned for the file in the st_size field of struct stat (see XBD <sys/stat.h>).

So the following would work:

$ fsize=($(ls -ln "${dump_file}"))
$ fsize=${fsize[4]}
# Or in one line with awk:
$ fsize=$(ls -ln "${dump_file}" | awk '{print $5}')

Comparison:

$ dd if=/dev/random bs=1M count=102400 | pv | dd of=./100G
209715200+0 records in
209715200+0 records out
107374182400 bytes (107 GB, 100 GiB) copied, 595,71 s, 180 MB/s
$ gstat -c '%s' 100G
107374182400
$ wc -c < 100G
107374182400
$ fsize=($(ls -ln 100G))
$ echo ${fsize[4]}
107374182400
$ ls -ln 100G | awk '{print $5}'
107374182400

If ls is an acceptable solution, I can change this PR to be about that, and split the uname-thing out into its own PR.

k0lter commented 5 months ago

Thanks for you suggestion about ls. I tend to avoid spawning unneeded process so the first choice has my preference.

About the hostname, I've already commit some kind of workaround, see: https://github.com/k0lter/autopostgresqlbackup/blob/master/autopostgresqlbackup#L202