holland-backup / holland

Holland Backup Manager
http://hollandbackup.org
Other
152 stars 49 forks source link

Lvm plus rsync #314

Open AboBuchanan opened 4 years ago

AboBuchanan commented 4 years ago

Hi Am using LVM with Tar at the moment, a lot of my databases are quite static anc contain large blobs, running tar with 0 - effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour

Ideally I would like to create a new way lvm+rsync Create snapshop Innodb recovery rsync to target directory (in config I guess)
finish and drop snapshot

This could also be achieved with a command before snapshot is dropped and excluding everything

My first thoughts are to expand the lvm one

soulen3 commented 4 years ago

This could be worked into the mysql-lvm module. Here's a list of things we would need to consider.

None of these things are deal breakers, but should be considers.

I'm assuming your goal is to reduce backup time, is that correct? Have you tested how long it takes for rsync to run after the data has been seeded? If it only takes 10 minutes to copy all the data using tar, you're really not going to save that much time and it's going to be CPU intensive.

AboBuchanan commented 4 years ago

Hi

Thanks for your fast response I agree with your points

Perhaps another way would be to implement a post tar command

Ie a command that runs after the tar (and for lvm MySQL dump after the export) and before the snapshot is shutdown

That way all the multiple logic would remain

FYI I would put the synched copy In /var/spool/Holland/var/lib/MySQL ( sorry my iPhone keeps on capping things) That way I have an easy access to all the databases and tables within

I’ve not looked at the source code yet

What is the best way to have a play with this

Thanks in advance

Dave

Sent from my iPhone

On 24 Sep 2020, at 18:22, soulen3 notifications@github.com wrote:

This could be worked into the mysql-lvm module. Here's a list of things we would need to consider.

Holland doesn't really have a concept of incremental backups, so you would need to put the backup somewhere besides /var/spool/holland/. (New config option like you mentioned) This will mess with purging failed backups. Using rsync in this way will mean the plugin would only have one active copy of the data. By default holland will complete a backup before purging the old one. I'm concerned that a corrupt backup directory would look similar to a successful one. None of these things are deal breakers, but should be considers.

I'm assuming your goal is to reduce backup time, is that correct? Have you tested how long it takes for rsync to run after the data has been seeded? If it only takes 10 minutes to copy all the data using tar, you're really not going to save that much time and it's going to be CPU intensive.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mikegriffin commented 4 years ago

Hello,

I am curious if you could clarify, as I am not sure I understood this

effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour

Do you mean to say that your backup takes 10 minutes with:

archive-method=dir

But that it takes an hour with both of these defaults:

archive-method=tar

[compression] method = gzip options = "" inline = yes split = no level = 1

Using archive-method=tar with default gzip compression can be quite slow, which is why Holland supports pigz or zstd. If your data in the "large blobs" are binary data, this is effectively "compressed" data and you would not want to use a compression method besides none

My question is whether you mean as you said and that the backup is slow due to tar or if you meant that the backup is fast when compression is off, in either tar/dir archive-method.

Respectfully, Mike

AboBuchanan commented 4 years ago

Hi

Sorry I think a typo made it unclear

Timings are both with tar 122 gig

Tar with default compression over an hour

Tar with zero compression 12 minutes

I was not aware of the dir method It looks like a directory copy

Thanks

Sent from my iPhone

On 24 Sep 2020, at 21:37, mikegriffin notifications@github.com wrote:

Hello,

I am curious if you could clarify, as I am not sure I understood this

effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour

Do you mean to say that your backup takes 10 minutes with:

archive-method=dir

But that it takes an hour with both of these defaults:

archive-method=tar

[compression] method = gzip options = "" inline = yes split = no level = 1

Using archive-method=tar with default gzip compression can be quite slow, which is why Holland supports pigz or zstd. If your data in the "large blobs" are binary data, this is effectively "compressed" data and you would not want to use a compression method besides none

My question is whether you mean as you said and that the backup is slow due to tar or if you meant that the backup is fast when compression is off, in either tar/dir archive-method.

Respectfully, Mike

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

soulen3 commented 4 years ago

Being honest, I forgot archive-method was an option. Does that solve your issue? I'm still not sure if I understand what you're trying to accomplish. I'm assuming you're trying to reduce the amount of time you need the snapshot available. Is that correct?

mikegriffin commented 4 years ago

Please let us know if (and maybe even how) the dir method is useful for you.

Adding in a new method like a hypothetical rsync-partial is a heavy hammer and we try to avoid confusing configuration options or footguns (understanding these implications is a very niche concept for data integrity.)

I would not think that dir is much faster than tar when compression method is none - it is mostly useful in cases where you want some other process to not copy a giant single file (external to holland) and where also the split option doesn't quite solve your dilemma. If I am honest, a ten minute backup sounds pretty good and I would not think that an rsync in default mode (without -P --append) would be faster.

AboBuchanan commented 4 years ago

Hi Soulen

I think a ten minute backup is brilliant So really happy with that

My additional aim would be to have the whole directory structure available without having to untar

Also i come from the time when 10 meg disks were a luxury so always looking to minimise data moving about

At the moment my backups are ten minutes or so to a separate disk on the same box Coping about 122 gig

This is then copied to a remote server which takes about 50 mins An r-synch would reduce the in box to a few gig and similarly for the remote copy

The non blocking lvm is brilliant Thanks everyone

This is now more of a mental exercise For me to understand Holland And hopefully provide more options and functionality for all

Dave

Sent from my iPhone

On 24 Sep 2020, at 21:55, soulen3 notifications@github.com wrote:

Being honest, I forgot archive-method was an option. Does that solve your issue? I'm still not sure if I understand what you're trying to accomplish. I'm assuming you're trying to reduce the amount of time you need the snapshot available. Is that correct?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

AboBuchanan commented 4 years ago

Hi Mike

I am not aware of what the dir method is Where is it configured I have as yet not found it in the docs

I’m guessing it’s a tar by directory or similar

For my databases total size 122gig the daily changes are I would say less than 1 gig In real data terms, in actual updated files i would need to do some further investigation

It seemed a good idea to this from the lvm snapshot before it is dropped

Perhaps as I mentioned earlier in the thread a post tar - but before snapshot command would be suitable

Thanks for all your responses

Dave

Sent from my iPhone

On 24 Sep 2020, at 22:00, mikegriffin notifications@github.com wrote:

Please let us know if (and maybe even how) the dir method is useful for you.

Adding in a new method like a hypothetical rsync-partial is a heavy hammer and we try to avoid confusing configuration options or footguns (understanding these implications is a very niche concept for data integrity.)

I would not think that dir is much faster than tar when compression method is none - it is mostly useful in cases where you want some other process to not copy a giant single file (external to holland) and where also the split option doesn't quite solve your dilemma. If I am honest, a ten minute backup sounds pretty good and I would not think that an rsync in default mode (without -P --append) would be faster.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

soulen3 commented 4 years ago

https://docs.hollandbackup.org/docs/provider_configs/mysql-lvm.html Last option under mysql-lvm

archive-method = tar | dir (default: tar)

Create a tar file of the datadir, or just copy it.

After the snapshot is complete you can run a command using after-backup-command https://docs.hollandbackup.org/docs/config.html#backup-set-configs

AboBuchanan commented 4 years ago

Hi Soulen

Brilliant thank you re the tar dir

Re after backup command , i have this in use at the moment and i run scripts that check diskspace and send out mail, works brilliantly as do before and failure ones

Unfortunately as i understand this the snapshot has gone at this point. So my rsync idea would not work nicely, hence i would add an extra command ,

Before snapshot drop or similar

Thanks

Dave

www.shapland.co.uk www.abo.co.uk

On 24 Sep 2020, at 22:28, soulen3 notifications@github.com wrote:

https://docs.hollandbackup.org/docs/provider_configs/mysql-lvm.html Last option under mysql-lvm

archive-method = tar | dir (default: tar)

Create a tar file of the datadir, or just copy it. After the snapshot is complete you can run a command using after-backup-command https://docs.hollandbackup.org/docs/config.html#backup-set-configs

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

mikegriffin commented 4 years ago

The problems with an rsync command that writes outside of backupdir are many (some examples are checking disk space is no longer viable, options that don't have to do expensive checksums of those files are risky, etc) and there has been a goal, when you have open a snapshot, to close it as quickly as possible due to "negative scalability" of read performance, as the snapshot size grows.

Allowing any hooks while the snapshot is open is obviously possible but perhaps not something that should be encouraged.

Did the archive-method=dir generally solve your issue (not wanting to untar the resulting backup during restore) with acceptable performance during the backup?

If you really do think you have some large files in the MySQL data dir that are 100% never written to, I would encourage you to test the rsync performance outside of holland using two backups if you have the space. Something like:

If you didn't have space for three backups or whatever is required to do such a test, make sure that you are reading the freshest backup from the same mount point, when you are doing an rsync test outside of Holland, so that you have the same read pressure.

mikegriffin commented 4 years ago

By the way, nothing is stopping you now from the after-backup-command using rsync to remote, if you start using dir method.

If many or large data files haven't changed, then you should see a significant speed up there from the current 50 minutes you measure.

Whether Holland proper used rsync or not has no impact on your external copy (unless you meant that you wouldn't store a local copy at all)

AboBuchanan commented 4 years ago

Hi Mike

Thanks for your feedback, I understand the issues and your reservations

I have just tried the dir method which ran in about the same time as the tar backup Backup completed in 11 minutes, 11.28 seconds

I will investigate the speed up of the rsync utility too

Question the log is stating Unknown parameter 'defaults-file' in section 'mysql:client’ Pete Caseta (from Rackspace) and I could not find where this is being set Could you point me in the right direction

Dave

Holland 1.1.21 started with pid 14045 --- Starting backup run --- Creating backup path /var/spool/holland/mysql-lvm/20200925_070602 Unknown parameter 'defaults-file' in section 'mysql:client' No backups purged Estimated Backup Size: 121.77GB Starting backup[mysql-lvm/20200925_070602] via plugin mysql-lvm Backing up /data00/var/lib/mysql via snapshot Auto-sizing snapshot-size to 15.00GB (3840 extents) Acquiring read-lock and flushing tables Recorded binlog = m1-mysql-bin.000049 position = 500876761 Recorded slave replication status: master_binlog = mysql-bin.001327 master_position = 193160447 Created snapshot volume /dev/vglocal01/data00_snapshot Releasing read-lock xfs filesystem detected on /dev/vglocal01/data00_snapshot. Using mount -o nouuid Mounted /dev/vglocal01/data00_snapshot on /tmp/tmpRv8ubA Starting InnoDB recovery Bootstrapping with /usr/libexec/mysqld Starting /usr/libexec/mysqld --defaults-file=/tmp/tmpRv8ubA/var/lib/mysql/my.innodb_recovery.cnf --bootstrap /usr/libexec/mysqld has stopped /usr/libexec/mysqld ran successfully Running: cp --archive /tmp/tmpRv8ubA/var/lib/mysql -t /var/spool/holland/mysql-lvm/20200925_070602/backup_data Unmounted /dev/vglocal01/data00_snapshot Final LVM snapshot size for /dev/vglocal01/data00_snapshot is 44.54MB during pre-remove Removed snapshot /dev/vglocal01/data00_snapshot Removing temporary mountpoint /tmp/tmpRv8ubA Final on-disk backup size 121.77GB 100.00% of estimated size 121.77GB Backup completed in 11 minutes, 11.28 seconds Released lock /etc/holland/backupsets/mysql-lvm.conf --- Ending backup run ---

On 25 Sep 2020, at 06:39, mikegriffin notifications@github.com wrote:

The problems with an rsync command that writes outside of backupdir are many (some examples are checking disk space is no longer viable, options that don't have to do expensive checksums of those files are risky, etc) and there has been a goal, when you have open a snapshot, to close it as quickly as possible due to "negative scalability" of read performance, as the snapshot size grows.

Allowing any hooks while the snapshot is open is obviously possible but perhaps not something that should be encouraged.

Did the archive-method=dir generally solve your issue (not wanting to untar the resulting backup during restore) with acceptable performance during the backup?

If you really do think you have some large files in the MySQL data dir that are 100% never written to, I would encourage you to test the rsync performance outside of holland using two backups if you have the space. Something like:

Set Holland lvm to use dir copy instead of tar

Copy a backup out of your Holland backupdir when it is complete, to some other location on the same mount point, or one with similar performance

Wait a couple of days and when Holland is not running, do an rsync from the newest backup dir to the one you preserved and see if it is significantly faster (I think this is unlikely)

If you didn't have space for three backups or whatever is required to do such a test, make sure that you are reading the freshest backup from the same mount point, when you are doing an rsync test outside of Holland, so that you have the same read pressure.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/holland-backup/holland/issues/314#issuecomment-698715815, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARD55N2XZFPY73JNCK5MSI3SHQNJVANCNFSM4RYBKVRQ.

soulen3 commented 4 years ago

Looks like there's a 'defaults-file' option being defined in the section 'mysql:client' of your backupset configuration file. 'defaults-file' isn't a valid option for that section of the config. It's looking for 'defaults-extra-file' if you're trying to add a .my.cnf file.

That warning shouldn't be causing any issues though.