olahallengren / sql-server-maintenance-solution

SQL Server Maintenance Solution
https://ola.hallengren.com
MIT License
2.9k stars 755 forks source link

Why do backups prune n+1? #189

Open tommls opened 5 years ago

tommls commented 5 years ago

Recently I have noticed that backups set to prune, for example, @CleanupTime = 24, are leaving 1 more than the requested cleanup time, e.g. this should leave only one backup in the folder, but instead two backups are kept, should I decrease the time to less than a multiple of 24?? This didn't used to happen before.

olahallengren commented 5 years ago

Think about that there can be some small variations in the backup duration. This can affect the cleanup. Could you try setting @CleanupTime = 23.

olahallengren commented 5 years ago

This didn't used to happen before.

I should also say that this code has not been changed in a long time.

drstonephd commented 5 years ago

I have noticed this before and always use a CleanupTime smaller than the time between backups (e.g. 23). Both the backup start time and duration can very. Databases prior might be bigger or smaller because of growth, migrations, or cleanup. Or perhaps a large database was added or deleted.

I did not look into it in detail, so I do not know how it really works. For example, are only the start times compared? Every second counts or is the time truncated or rounded to the nearest hour?

olahallengren commented 5 years ago

The @CleanupTime is calculated just before the cleanup is taking place.

SELECT DATEADD(hh,-(@CleanupTime),GETDATE())

So let's say that a backup and verify of a database is completed 00:10:00 one day. The next day it is completed 00:09:00. Then the old backup is not 24 hours old and will not get deleted.

drstonephd commented 5 years ago

Ola, would an option to use the GETDATE() at the start of the job rather than just before cleanup be of value? The time would be determined by schedules and not vary because of runtime conditions. It would have to be understood the time between scheduled runs should be much greater than the duration of the job.

tommls commented 5 years ago

This seems to me the most sensible approach, perhaps others could add their comments and Ola will decide if he wants to add any comments to his website about this. Some database backups may take longer than an hour to complete, including the compression process, and this could affect cleanup timing. On one server, I have many databases of varying size to back up, so this is possibly also a factor.

Today I set all our database servers' full backups to 47 instead of 48 or 72, and this should make a difference.

Same for log backups, to be consistent throughout.

olahallengren commented 5 years ago

I think that there are two points in time that are interesting:

  1. The completation time of a backup operation (this is when the the modified date of the file is set)

  2. The time just before the cleanup is starting. That is when the cleanup time is calculated. SELECT DATEADD(hh,-(@CleanupTime),GETDATE())

So if the time between the backup operation completes one day, and the cleanup starts the next day, is less than 24 hours in this case, then we have this issue.

tommls commented 5 years ago

And the sequence of backups and how long any given database takes to back up and compress?? This varies from db to db and site to site etc. etc.

The script is making its best guess/determination with what it 'knows' about the files, aka time modified and time cleanup starts, so far as I can see from Ola's explanation.

I'll check my backups tomorrow morning.

drstonephd commented 5 years ago

Maybe we can consider using a "fuzzy" parameter to determine an upper deviation limit (e.g., 10 minutes) to what is normally expected for a shift in completion time. Then anything older than 24 hours minus this limit can be deleted. Using a cleanup time of 23 rather than 24 already does this with the assumption that the deviation can be an hour at most.

A limit might be what we want. However, if the requirement is to keep a complete backup chain on local disk that covers the SLA (e.g. point in time recovery for the past 24 hours from local disk), then determining which set of files to keep on disk is what we need. For us, we tend to support a point in time recovery from local disk from the prior night when the full backup runs; otherwise, we would have to retain two full backups on disk. We use full (23 hr cleanup), diff (23 hr cleanup), and log (25 hr cleanup) backups to insure we have the files required. We are okay with deleting the full and diff backups a little too soon. This is not the case for the first log backup after a full backup.

Other considerations are space usage over time and system backups. For space, the current method tends to keep the space usage consistent over time, creating a file to replace a file rather than deleting a backup chain and growing a backup chain. (Assuming we account for deviation.) We also try to time the full backups to be just before the system backups to insure the most recent full backup is on "tape". (The choice is the most recent full backup or the prior full backup plus its chain, if it has a chain. We have simple recovery for some databases, so point-in-time is not an option for them.) I mention this because our Veeam backups were early for a while. This meant we did not have the "last night" database backups expected in the system backup...we had the night before last backups.

tommls commented 5 years ago

How is a 25-hour cleanup for log files helpful?? I never thought of that.

For me, the hours work for pruning backups because I can extract databases from Veeam backups.

The 25-hour log cleanup, I'd like to understand that better before I program in n + 1 hour for log file retention.

I like doing it this way better than asking Ola to write complicated retention time code. :)

olahallengren commented 5 years ago

Thank you for the great feedback. Backup retention is complicated.

One thing to consider: The risk of running into this issue is lower when you are using the @Verify = 'Y' option. That is because then the cleanup will be done after the verify.

Another thing to be aware of: If you set @CleanupTime = 0 for the full backup job, it will delete all the previous full backups for this database, after the new one has been successfully completed.

tommls commented 5 years ago

I have verify = Y on all my full backups, it's a default setting so far as I know, but I still had excess retentions, which goes back to backup duration or quantity of databases, the script can not compensate for these two variables.

I would still like to know the differences between a 23 hours vs. 25 hours for LOG backup retention, which is better/safer?? Log backups are actually more helpful and important to keep properly, so any and all illumination of this topic would be helpful.

drstonephd commented 5 years ago

Both the full backup and transaction log backup jobs can have deviations in time. Our batch processes can sometimes create transaction log files many GB larger than the norm, some similar in size to the database. The first transaction log to apply after a full backup restore needs to cover the time window when the full backup was taken. So, depending upon when the full and log backups are scheduled relative to each other, I don't know for sure which log backup will cover the backup event. Could the full and log backups deviate in opposite directions enough that the tail-end of the log backup prior to the one expected cover the full backup? If both the full (e.g., nightly) and log (e.g., hourly) backups are schedule on the hour, then best keep an extra log file just in case.

olahallengren commented 5 years ago

We use full (23 hr cleanup), diff (23 hr cleanup), and log (25 hr cleanup) backups to insure we have the files required.

There is a check in the script to assure that it is not deleting any log backups that are new than the latest full or differential backup.

drstonephd commented 5 years ago

Ola, that is good to know. Is this check based on completion time or on LSNs? I suppose it's not possible for the log backup covering the full backup time to be completed before the full backup starts. I would expect it to normally finish before the full backup completes.

olahallengren commented 5 years ago

I will review this code again, but it looks like I am using backup_finish_date in msdb.dbo.backupset.

drstonephd commented 5 years ago

Perhaps the full backup backup_start_date is the lower limit for completion dates of log files?

olahallengren commented 5 years ago

I will do some more investigations on this and get back.

olahallengren commented 5 years ago

I did a small change in the check that assures that log backups, that are newer than the latest full or differential backup, are not deleted.

I am now using backup_start_date instead of backup_finish_date.

Dean-McMillan commented 4 years ago

Technically the behaviour is correct given it is looking at the modified time on the full backup file - and if the date goes over the threshold then it will not be deleted. SQL Server maintenance Plans are affected by this behavior as well, since both backup solutions use xp_delete_file. I think however a good option may be to provide the ability to retain backups (within each backup type) based on the count of backups done.