Open AboBuchanan opened 4 years ago
This could be worked into the mysql-lvm module. Here's a list of things we would need to consider.
None of these things are deal breakers, but should be considers.
I'm assuming your goal is to reduce backup time, is that correct? Have you tested how long it takes for rsync to run after the data has been seeded? If it only takes 10 minutes to copy all the data using tar, you're really not going to save that much time and it's going to be CPU intensive.
Hi
Thanks for your fast response I agree with your points
Perhaps another way would be to implement a post tar command
Ie a command that runs after the tar (and for lvm MySQL dump after the export) and before the snapshot is shutdown
That way all the multiple logic would remain
FYI I would put the synched copy In /var/spool/Holland/var/lib/MySQL ( sorry my iPhone keeps on capping things) That way I have an easy access to all the databases and tables within
I’ve not looked at the source code yet
What is the best way to have a play with this
Thanks in advance
Dave
Sent from my iPhone
On 24 Sep 2020, at 18:22, soulen3 notifications@github.com wrote:
This could be worked into the mysql-lvm module. Here's a list of things we would need to consider.
Holland doesn't really have a concept of incremental backups, so you would need to put the backup somewhere besides /var/spool/holland/. (New config option like you mentioned) This will mess with purging failed backups. Using rsync in this way will mean the plugin would only have one active copy of the data. By default holland will complete a backup before purging the old one. I'm concerned that a corrupt backup directory would look similar to a successful one. None of these things are deal breakers, but should be considers.
I'm assuming your goal is to reduce backup time, is that correct? Have you tested how long it takes for rsync to run after the data has been seeded? If it only takes 10 minutes to copy all the data using tar, you're really not going to save that much time and it's going to be CPU intensive.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hello,
I am curious if you could clarify, as I am not sure I understood this
effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour
Do you mean to say that your backup takes 10 minutes with:
archive-method=dir
But that it takes an hour with both of these defaults:
archive-method=tar
[compression] method = gzip options = "" inline = yes split = no level = 1
Using archive-method=tar
with default gzip
compression can be quite slow, which is why Holland supports pigz or zstd. If your data in the "large blobs" are binary data, this is effectively "compressed" data and you would not want to use a compression method
besides none
My question is whether you mean as you said and that the backup is slow due to tar
or if you meant that the backup is fast when compression is off, in either tar/dir archive-method.
Respectfully, Mike
Hi
Sorry I think a typo made it unclear
Timings are both with tar 122 gig
Tar with default compression over an hour
Tar with zero compression 12 minutes
I was not aware of the dir method It looks like a directory copy
Thanks
Sent from my iPhone
On 24 Sep 2020, at 21:37, mikegriffin notifications@github.com wrote:
Hello,
I am curious if you could clarify, as I am not sure I understood this
effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour
Do you mean to say that your backup takes 10 minutes with:
archive-method=dir
But that it takes an hour with both of these defaults:
archive-method=tar
[compression] method = gzip options = "" inline = yes split = no level = 1
Using archive-method=tar with default gzip compression can be quite slow, which is why Holland supports pigz or zstd. If your data in the "large blobs" are binary data, this is effectively "compressed" data and you would not want to use a compression method besides none
My question is whether you mean as you said and that the backup is slow due to tar or if you meant that the backup is fast when compression is off, in either tar/dir archive-method.
Respectfully, Mike
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Being honest, I forgot archive-method
was an option. Does that solve your issue? I'm still not sure if I understand what you're trying to accomplish. I'm assuming you're trying to reduce the amount of time you need the snapshot available. Is that correct?
Please let us know if (and maybe even how) the dir
method is useful for you.
Adding in a new method like a hypothetical rsync-partial
is a heavy hammer and we try to avoid confusing configuration options or footguns (understanding these implications is a very niche concept for data integrity.)
I would not think that dir
is much faster than tar
when compression method
is none
- it is mostly useful in cases where you want some other process to not copy a giant single file (external to holland) and where also the split
option doesn't quite solve your dilemma. If I am honest, a ten minute backup sounds pretty good and I would not think that an rsync in default mode (without -P --append
) would be faster.
Hi Soulen
I think a ten minute backup is brilliant So really happy with that
My additional aim would be to have the whole directory structure available without having to untar
Also i come from the time when 10 meg disks were a luxury so always looking to minimise data moving about
At the moment my backups are ten minutes or so to a separate disk on the same box Coping about 122 gig
This is then copied to a remote server which takes about 50 mins An r-synch would reduce the in box to a few gig and similarly for the remote copy
The non blocking lvm is brilliant Thanks everyone
This is now more of a mental exercise For me to understand Holland And hopefully provide more options and functionality for all
Dave
Sent from my iPhone
On 24 Sep 2020, at 21:55, soulen3 notifications@github.com wrote:
Being honest, I forgot archive-method was an option. Does that solve your issue? I'm still not sure if I understand what you're trying to accomplish. I'm assuming you're trying to reduce the amount of time you need the snapshot available. Is that correct?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Hi Mike
I am not aware of what the dir method is Where is it configured I have as yet not found it in the docs
I’m guessing it’s a tar by directory or similar
For my databases total size 122gig the daily changes are I would say less than 1 gig In real data terms, in actual updated files i would need to do some further investigation
It seemed a good idea to this from the lvm snapshot before it is dropped
Perhaps as I mentioned earlier in the thread a post tar - but before snapshot command would be suitable
Thanks for all your responses
Dave
Sent from my iPhone
On 24 Sep 2020, at 22:00, mikegriffin notifications@github.com wrote:
Please let us know if (and maybe even how) the dir method is useful for you.
Adding in a new method like a hypothetical rsync-partial is a heavy hammer and we try to avoid confusing configuration options or footguns (understanding these implications is a very niche concept for data integrity.)
I would not think that dir is much faster than tar when compression method is none - it is mostly useful in cases where you want some other process to not copy a giant single file (external to holland) and where also the split option doesn't quite solve your dilemma. If I am honest, a ten minute backup sounds pretty good and I would not think that an rsync in default mode (without -P --append) would be faster.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
https://docs.hollandbackup.org/docs/provider_configs/mysql-lvm.html Last option under mysql-lvm
archive-method = tar | dir (default: tar)
Create a tar file of the datadir, or just copy it.
After the snapshot is complete you can run a command using after-backup-command
https://docs.hollandbackup.org/docs/config.html#backup-set-configs
Hi Soulen
Brilliant thank you re the tar dir
Re after backup command , i have this in use at the moment and i run scripts that check diskspace and send out mail, works brilliantly as do before and failure ones
Unfortunately as i understand this the snapshot has gone at this point. So my rsync idea would not work nicely, hence i would add an extra command ,
Before snapshot drop or similar
Thanks
Dave
www.shapland.co.uk www.abo.co.uk
On 24 Sep 2020, at 22:28, soulen3 notifications@github.com wrote:
https://docs.hollandbackup.org/docs/provider_configs/mysql-lvm.html Last option under mysql-lvm
archive-method = tar | dir (default: tar)
Create a tar file of the datadir, or just copy it. After the snapshot is complete you can run a command using after-backup-command https://docs.hollandbackup.org/docs/config.html#backup-set-configs
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.
The problems with an rsync command that writes outside of backupdir are many (some examples are checking disk space is no longer viable, options that don't have to do expensive checksums of those files are risky, etc) and there has been a goal, when you have open a snapshot, to close it as quickly as possible due to "negative scalability" of read performance, as the snapshot size grows.
Allowing any hooks while the snapshot is open is obviously possible but perhaps not something that should be encouraged.
Did the archive-method
=dir
generally solve your issue (not wanting to untar the resulting backup during restore) with acceptable performance during the backup?
If you really do think you have some large files in the MySQL data dir that are 100% never written to, I would encourage you to test the rsync performance outside of holland using two backups if you have the space. Something like:
Set Holland lvm to use dir copy instead of tar
Copy a backup out of your Holland backupdir when it is complete, to some other location on the same mount point, or one with similar performance
Wait a couple of days and when Holland is not running, do an rsync from the newest backup dir to the one you preserved and see if it is significantly faster (I think this is unlikely)
If you didn't have space for three backups or whatever is required to do such a test, make sure that you are reading the freshest backup from the same mount point, when you are doing an rsync test outside of Holland, so that you have the same read pressure.
By the way, nothing is stopping you now from the after-backup-command
using rsync
to remote, if you start using dir
method.
If many or large data files haven't changed, then you should see a significant speed up there from the current 50 minutes you measure.
Whether Holland proper used rsync or not has no impact on your external copy (unless you meant that you wouldn't store a local copy at all)
Hi Mike
Thanks for your feedback, I understand the issues and your reservations
I have just tried the dir method which ran in about the same time as the tar backup Backup completed in 11 minutes, 11.28 seconds
I will investigate the speed up of the rsync utility too
Question the log is stating Unknown parameter 'defaults-file' in section 'mysql:client’ Pete Caseta (from Rackspace) and I could not find where this is being set Could you point me in the right direction
Dave
Holland 1.1.21 started with pid 14045 --- Starting backup run --- Creating backup path /var/spool/holland/mysql-lvm/20200925_070602 Unknown parameter 'defaults-file' in section 'mysql:client' No backups purged Estimated Backup Size: 121.77GB Starting backup[mysql-lvm/20200925_070602] via plugin mysql-lvm Backing up /data00/var/lib/mysql via snapshot Auto-sizing snapshot-size to 15.00GB (3840 extents) Acquiring read-lock and flushing tables Recorded binlog = m1-mysql-bin.000049 position = 500876761 Recorded slave replication status: master_binlog = mysql-bin.001327 master_position = 193160447 Created snapshot volume /dev/vglocal01/data00_snapshot Releasing read-lock xfs filesystem detected on /dev/vglocal01/data00_snapshot. Using mount -o nouuid Mounted /dev/vglocal01/data00_snapshot on /tmp/tmpRv8ubA Starting InnoDB recovery Bootstrapping with /usr/libexec/mysqld Starting /usr/libexec/mysqld --defaults-file=/tmp/tmpRv8ubA/var/lib/mysql/my.innodb_recovery.cnf --bootstrap /usr/libexec/mysqld has stopped /usr/libexec/mysqld ran successfully Running: cp --archive /tmp/tmpRv8ubA/var/lib/mysql -t /var/spool/holland/mysql-lvm/20200925_070602/backup_data Unmounted /dev/vglocal01/data00_snapshot Final LVM snapshot size for /dev/vglocal01/data00_snapshot is 44.54MB during pre-remove Removed snapshot /dev/vglocal01/data00_snapshot Removing temporary mountpoint /tmp/tmpRv8ubA Final on-disk backup size 121.77GB 100.00% of estimated size 121.77GB Backup completed in 11 minutes, 11.28 seconds Released lock /etc/holland/backupsets/mysql-lvm.conf --- Ending backup run ---
On 25 Sep 2020, at 06:39, mikegriffin notifications@github.com wrote:
The problems with an rsync command that writes outside of backupdir are many (some examples are checking disk space is no longer viable, options that don't have to do expensive checksums of those files are risky, etc) and there has been a goal, when you have open a snapshot, to close it as quickly as possible due to "negative scalability" of read performance, as the snapshot size grows.
Allowing any hooks while the snapshot is open is obviously possible but perhaps not something that should be encouraged.
Did the archive-method=dir generally solve your issue (not wanting to untar the resulting backup during restore) with acceptable performance during the backup?
If you really do think you have some large files in the MySQL data dir that are 100% never written to, I would encourage you to test the rsync performance outside of holland using two backups if you have the space. Something like:
Set Holland lvm to use dir copy instead of tar
Copy a backup out of your Holland backupdir when it is complete, to some other location on the same mount point, or one with similar performance
Wait a couple of days and when Holland is not running, do an rsync from the newest backup dir to the one you preserved and see if it is significantly faster (I think this is unlikely)
If you didn't have space for three backups or whatever is required to do such a test, make sure that you are reading the freshest backup from the same mount point, when you are doing an rsync test outside of Holland, so that you have the same read pressure.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/holland-backup/holland/issues/314#issuecomment-698715815, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARD55N2XZFPY73JNCK5MSI3SHQNJVANCNFSM4RYBKVRQ.
Looks like there's a 'defaults-file' option being defined in the section 'mysql:client' of your backupset configuration file. 'defaults-file' isn't a valid option for that section of the config. It's looking for 'defaults-extra-file' if you're trying to add a .my.cnf
file.
That warning shouldn't be causing any issues though.
Hi Am using LVM with Tar at the moment, a lot of my databases are quite static anc contain large blobs, running tar with 0 - effectively not tar backs up 122 Gig in 10 minutes with tar compression takes an hour
Ideally I would like to create a new way lvm+rsync Create snapshop Innodb recovery rsync to target directory (in config I guess)
finish and drop snapshot
This could also be achieved with a command before snapshot is dropped and excluding everything
My first thoughts are to expand the lvm one