Closed redmop closed 7 years ago
I've not made any changes to the code yet.
I thought it was first filling out the snapshot count, in other words, getting 7 yearly snapshots like I requested, but I have 13 now.
Not sure how you managed that. If you're doing --take-snapshots directly, that might have bugs in it, because, well, I don't actually use that in production so it's a lot less heavily tested. =)
I know that using --cron in a crontab * * * * * doesn't produce extra snapshots like that.
On 09/26/2015 04:05 PM, redmop wrote:
I've not made any changes to the code yet.
I thought it was first filling out the snapshot count, in other words, getting 7 yearly snapshots like I requested, but I have 13 now.
— Reply to this email directly or view it on GitHub https://github.com/jimsalterjrs/sanoid/issues/14#issuecomment-143491376.
I bet --take-snapshots isn't updating the cache. Try setting the cache expiration to 0 in your sanoid.conf - you can find the syntax in sanoid.defaults.conf (but don't edit that file directly!)
On September 26, 2015 16:05:05 redmop notifications@github.com wrote:
I've not made any changes to the code yet.
I thought it was first filling out the snapshot count, in other words, getting 7 yearly snapshots like I requested, but I have 13 now.
Reply to this email directly or view it on GitHub: https://github.com/jimsalterjrs/sanoid/issues/14#issuecomment-143491376
I'm not using --take-snapshots. I'm still playing with it, so I am following directions exactly.
* * * * * /usr/local/bin/sanoid --cron
I don't see anything like that in sanoid.defaults.conf. Do you mean either of these?
my $forcecacheupdate = 0;
my $cacheTTL = 900; # 15 minutes
Also, it seems to have stabilized at 13 yearly snapshots. I did only ask for 7 though.
Hourly snapshots are also messing up. I'll just paste all the snapshots it's taken so far. Maybe the script is sensitive to high loads on the pool. I was using syncoid on it for a while today. This is an old server getting ready to be retired. I was using zfSnap on here. It does recursive snapshots. That might be faster/less load sensitive.
dpool/data@autosnap_2015-09-26_13:49:03_daily written 0 -
dpool/data@autosnap_2015-09-26_13:49:03_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:49:03_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:49:03_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:44:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:44:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:44:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:50:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:44:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:50:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:50:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:50:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:45:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:45:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:54:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:45:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:54:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:45:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:54:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:54:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:47:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:47:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:52:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:47:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:51:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:47:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:51:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:52:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:52:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:51:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:51:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:52:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:53:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:53:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:53:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:48:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:48:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:53:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:48:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:48:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_14:01:02_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:55:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:55:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:55:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:55:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:56:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:46:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:56:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:46:01_daily written 0 -
dpool/data@autosnap_2015-09-26_13:56:01_daily written 0 -
dpool/data@autosnap_2015-09-26_14:00:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_13:56:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_13:46:01_yearly written 0 -
dpool/data@autosnap_2015-09-26_13:46:01_monthly written 0 -
dpool/data@autosnap_2015-09-26_14:02:02_hourly written 0 -
dpool/data@autosnap_2015-09-26_15:00:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_15:01:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_15:02:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_15:03:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_15:04:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:01:03_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:02:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:00:02_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:03:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:05:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:06:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_16:04:01_hourly written 0 -
dpool/data@autosnap_2015-09-26_17:00:01_hourly written 0 -
I'm not currently stressing the system, and it seems to be doing a better job.
Any particular reason you're not using recursive snapshots?
have not seen this issue on any systems or heard "me toos" from anybody else - closing.
I've also been seeing this happen. I'm still reading through the code, but my gut is telling me it's because there isn't a lock while snapshots are being taken and multiple sanoid instances are running in parallel.
It's possible, if you've got a really heavily loaded system. It would need to be so heavily loaded that you had snapshot creation taking longer than the time in between sanoid --cron runs, though, which sounds... pretty brutal.
Let me know if you figure out something different.
I don't think it's about load, I think it's more about the imposed 1 second sleep between snaps. I saw it happen on my first run with 40 zfs datasets. Take a monthly, daily, and hourly for each, add in the small delays for taking the snapshots themselves, and you're easily over a full minute runtime (default crontab interval).
Perhaps it would be safest to just have a lock to avoid multiple instances taking snapshots simultaneously.
Well, that could certainly do it, if you have that many datasets and haven't taken many snaps.
The sleeps are actually in there to keep a chrono order available for the snaps. Granularity on the birth time for snaps is a full second, so ZFS doesn't have any way of knowing whether the hourly, daily, monthly, or yearly is "older" if they're all taken during the same second. That ended up being really obnoxious, to the point that I added the sleeps to make sure no two snaps had the same exact birth time.
Though TBH I'm forgetting now WHY that was so obnoxious, given that each type of snapshot has its own separate policy. It /did/ cause some obnoxious issue, though, I remember that much...
On 12/31/2015 03:28 PM, jjlawren wrote:
I don't think it's about load, I think it's more about the imposed 1 second sleep between snaps. I saw it happen on my first run with 40 zfs datasets. Take a monthly, daily, and hourly for each, add in the small delays for taking the snapshots themselves, and you're easily over a full minute runtime (default crontab interval).
Perhaps it would be safest to just have a lock to avoid multiple instances taking snapshots simultaneously.
— Reply to this email directly or view it on GitHub https://github.com/jimsalterjrs/sanoid/issues/14#issuecomment-168242957.
Possible to reopen this request?
I still have this happen from time to time, and I set sanoid to run every 5 min. The system isn't really loaded, and sanoid is managing about 10 datasets with 48 hourly, and 7 daily retention.
I'm reopening this, but since I've been unable to duplicate I don't know that it's going to get resolved any time soon. If anybody else can figure out why it might be happening OTHER than extreme load, I'm more than willing to poke at it and resolve. Or if you want to give me remote access to a system that's experiencing the issue regularly and testably, that might work.
Until then, it's hard for me to fix something that I can't repeatably break. I don't experience this issue on any of the 100+ Sanoid hosts I manage.
I have a machine that is taking double snapshots. It will do one at 13:00 and then another at 13:01. I tried to set the cron job to only run sanoid every five minutes, but that meant the duplicate snapshot was five minutes after instead of one. Daily and Monthly seem fine. I am running NAS4Free, not sure if it is Linux vs *BSD thing or not. Remote access is not out of the question.
I actually noticed the same thing today. My version was about 4 months old, so I updated it today from master. ZoL - Ubuntu 16.04
Did upgrading to current solve your issue?
(Sent from my tablet - please blame any weird errors on autocorrect)
On November 2, 2016 18:50:31 Jessie Bryan notifications@github.com wrote:
I actually noticed the same thing today. My version was about 4 months old, so I updated it today from master.
You are receiving this because you modified the open/close state. Reply to this email directly or view it on GitHub: https://github.com/jimsalterjrs/sanoid/issues/14#issuecomment-258023448
@redmop
Are all the datasets you specified in your sanoid.conf currenting existing on this machine you're running the cronjob on?
See #43
I had left the sample datasets in there and it messed up the retention I had set and snapshots did not take fully until I only explicitly stated zpools which existed on my system.
I am still seeing excessive hourly snapshots within the same hour. Perhaps it's my CFG? Take a look:
I observed similar issues with daily on one of my systems (centos 7 zol). timezone: EST sanoid: 1.4.6c
http://www.hastebin.com/uhadikonot.coffeescript
it looks all daily snapshots where taken at 23:59:01 every day ... but somehow on 6 of Nov it took daily snapshots every minute.
Could this be related to Daylight Saving ? even for EST timezone 2AM becomes 1 AM ... which don't coincide with the times from snapshot.
-I.
I got it cause of a lock had remained causing ps to throw an error. might not be related tho. (i had a few thousand snapshots)
closing again, filed under "wtflol". If somebody can produce a replicable testcase, please let me know.
@Anderath I've not used sanoid for a while, though I will be setting it up again on 3 servers within the next week, so I don't know how I had the datasets setup.
@jimsalterjrs I am also seeing for monthlies accumulate daily without a limit. I can't unfortunately (at least yet) produce a replicable test case but I noticed that all the snapshot times seem to correspond when the system was (re)started in the morning:
ssd/vms@autosnap_2018-08-31_07:53:01_monthly 1.73M - 98.7G -
ssd/vms@autosnap_2018-09-03_07:36:01_monthly 2.94M - 98.8G -
ssd/vms@autosnap_2018-09-04_07:52:02_monthly 2.29M - 98.9G -
ssd/vms@autosnap_2018-09-05_07:36:01_monthly 1.71M - 98.9G -
ssd/vms@autosnap_2018-09-06_07:37:02_monthly 4.21M - 99.0G -
ssd/vms@autosnap_2018-09-07_07:36:02_monthly 1.75M - 99.3G -
ssd/vms@autosnap_2018-09-10_07:36:02_monthly 5.75M - 99.4G -
ssd/vms@autosnap_2018-09-12_07:59:02_monthly 26.9M - 99.5G -
ssd/vms@autosnap_2018-09-13_07:39:02_monthly 11.0M - 99.6G -
ssd/vms@autosnap_2018-09-14_07:35:02_monthly 2.03M - 99.8G -
ssd/vms@autosnap_2018-09-17_07:37:02_monthly 5.45M - 99.9G -
ssd/vms@autosnap_2018-09-18_07:36:01_monthly 1.03M - 100G -
ssd/vms@autosnap_2018-09-19_07:37:01_monthly 7.93M - 100G -
ssd/vms@autosnap_2018-09-21_07:40:01_monthly 812K - 101G -
ssd/vms@autosnap_2018-09-22_14:00:01_monthly 188K - 101G -
ssd/vms@autosnap_2018-09-23_13:14:01_monthly 2.21M - 101G -
Filesystem ssd/vms has:
124 total snapshots (newest: 0.6 hours old)
36 hourly
desired: 36
newest: 0.6 hours old, named autosnap_2018-09-23_14:00:01_hourly
58 monthly
desired: 3
newest: 1.4 hours old, named autosnap_2018-09-23_13:14:01_monthly
30 daily
desired: 30
newest: 1.4 hours old, named autosnap_2018-09-23_13:14:01_daily
https://gist.github.com/varesa/d2da38d8ad245f8536567202dbb841c7
I know this is a really old ticket... but 'me too'. :-)
On Ubuntu 16.04, stock sanoid from the repo..
/usr/sbin/sanoid version 2.0.3 (Getopt::Long::GetOptions version 2.45; Perl version 5.22.1)
Destroyed all the snapshots on one filesystem earlier ontoday and now I've got:
data/vms/pbcinfo-root@autosnap_2021-02-03_06:00:01_hourly 530K - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_19:09:32_monthly 292K - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_06:15:01_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_06:15:01_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_06:45:02_monthly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_06:45:02_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_06:45:02_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_06:45:02_hourly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_07:00:01_hourly 664K - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_07:30:02_monthly 0 - 8.75G - data/vms/pbcinfo-root@autosnap_2021-02-03_07:30:02_weekly 0 - 8.75G - data/vms/pbcinfo-root@autosnap_2021-02-03_07:30:02_daily 0 - 8.75G - data/vms/pbcinfo-root@autosnap_2021-02-03_07:30:02_hourly 0 - 8.75G - data/vms/pbcinfo-root@autosnap_2021-02-03_08:00:00_monthly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_08:00:00_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_08:00:00_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_08:00:00_hourly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:00:01_monthly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:00:01_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:00:01_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:00:01_hourly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:30:01_monthly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:30:01_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:30:01_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_09:30:01_hourly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:00:02_monthly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:00:02_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:00:02_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:00:02_hourly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:30:02_monthly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:30:02_weekly 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:30:02_daily 0 - 8.74G - data/vms/pbcinfo-root@autosnap_2021-02-03_10:30:02_hourly 0 - 8.74G -
Premature enter... :-)
I've got three 16.04's running sanoid and the other two are fine, it's just this one that's going crazy with the snapshots every 30 mins.
I've just disabled the systemctl timer and created a cron job that runs every 30 mins instead.
config is really basic:
[data/vms] use_template = hourly recursive = yes process_children_only = yes [template_hourly] frequently = 0 hourly = 24 daily = 14 weekly = 5 monthly = 12 yearly = 0 autosnap = yes autoprune = yes
This server is a wee bit slow, but not crazily slow, it's a dell 620 with 10krpm spinning rust sas drives in raid 10 for the ZFS and an SSD OS /boot.
Not many filesystems:
# zfs list -o name NAME data data/iso data/vms data/vms/atetftp-root data/vms/jump28-root data/vms/pbcinfo-root data/vms/radius2-root data/vms/standard data/vms/standard/ns2
Although due to this issue there are 'few' snapshots..
# zfs list -t snapshot | grep -c data 4655 #
none dated older than 30th of December last year when I discovered the issue and destroyed all the snapshots to see if it came back clean...
We'll be rebuilding this box in the next month or two to 20.04 as 16.04 is EOL in April but thought I'd add this to his issue in case it triggers something, so to speak..
Relevant zfs get written
Cron line
* * * * * /usr/local/bin/sanoid --cron
/etc/sanoid/sanoid.conf