Closed patricoferris closed 3 years ago
Is the ZFS backend currently being used on any other operating systems than macOS?
To the best of my knowledge I don't believe it is. I think everything is using btrfs at the moment.
That's helpful -- at least we can unblock the macOS deployment with your workaround, and then see if we can isolate the issue on FreeBSD or Linux at a later stage to see if it's specific to the ZFS-on-macOS implementation
Yep -- I'm increasingly thinking it is either me or ZFS-on-macOS as it looks like the stress test https://github.com/ocurrent/obuilder/blob/master/.run-travis-tests.sh#L27 does test zfs and is happy enough (I haven't looked into exactly the build run so I'm assuming it does some "restoring" from snapshots).
... turns out it may well just be that on O3X (OpenzfsOnOsX) the snap directory is not automatically mounted, haven't tested it just yet (in a build) but running it on datasets I have lying about suggests it will fix the problem. If it fixes the problems I'll close the issue, sorry for the noise :))
Indeed this fixes any issues with using Sys.file_exists
or reading the log
from the snapshot
by mounting the snapshot any time you wish to use it and unmounting it afterward. Seeing as it is a macOS specific thing, I'm closing this for now.
This is an issue to track a potential bug in the ZFS code. I'm saying potential because it could just be something I'm doing wrong, something wrong with ZFS for macOS, or an actual problem. Whichever it is I thought it best to record it and I can test it later on other platforms to see if it which it is. Currently (as part of enabling MacOS support with a ZFS backend #57) I'm taking the implementation for a spin with opam-health-check.
One problem I was finding (and it took a while to diagnose) was that when the builder was building new jobs it tried to detect snapshots to see if it could restore from them instead of rebuilding things but could not find them. It would then try to build a new snapshot and fail because, in actual fact, the snapshot was still there.
The code in question is ZFS_store's implementation of getting the result. It would seem that in a very sporadic and hard to reproduce way the "directory"
tank/result/<hash>/.zfs/snapshot
could just disappear, or be filled with other things besides the snapshot (snap
) at which point theSys.file_exists
would fail.I changed the code to use a combination of
zfs list
andOs.pread
to look for the snapshot instead. Something like:It's not pretty and it should definitely be using
t
andds
to narrow down the search, but it seems to sort everything out for me for now. Again, this is mainly for tracking purposes, I may discover I was doing something wrong and that's the reason.