Closed pittbull closed 7 months ago
Are you running any other Samba services on the same host by chance? How are you exposing the Time Machine container to your network? Just exposing ports? Using the host network? I see responses about multiple responses received for a subnet so it seems like there may be another nmbd process running on the same network interface. I don't recall if this would be something that would be causing the behavior that you're seeing but it would be worth looking into. I have samba running on my host and as a time machine container but I am using macvlan to work around the issue but there are a few other configurations that may help as described here.
And while I don't think it's related to any issues you're seeing, I think there may be a misconfiguration with your multi-user setup. Check out this part of the readme to make sure you've configured it correctly. It's been a while since I have tested multi-user configuration but seeing the messages about chown: unknown user/group
are what make me think something is misconfigured but it is hard to tell from the xml output.
I am running TimeMachine on my Unraid server so I assume there is a samba service there.
The container is being exposed over br0, and even though another nmbd process would exist this shouldn't result in my backup failing where it does?
Thanks for spotting the multi-user error, but I believe I have followed the readme but will do a revisit.
I did a new test yesterday with the same result so something is wrong somewhere.
While I am not certain that a second nmbd process would cause a problem, I'd rather just rule it out to make sure. Initial backups can really take a long time. In terms of when it actually failed during the backup, do you have any idea where in the logs it actually is throwing the error at the point of the disconnect/error? Was it this or somewhere else?:
vfs_default_durable_reconnect (Mythbuster.sparsebundle/bands/cd5): stat_ex.st_ex_blocks differs: cookie:65152 != stat:64136, denying durable reconnect
mythbuster (ipv4:192.168.1.249:61583) closed connection to service ulfthomas
So, I believe I have cleaned up the user stuff but I still fail to complete the first backup. I have found a way to extract the logs from the Mac itself and based on the last lines it basically states that destination volume has become unavailable which is backed up by my observations that I face a complete network shutdown (all local ssh's, web browsing, ping etc) is rendered unavailable. This must be related to something outside of your docker image and Unraid, so I am troubleshooting my network.
As a side note, I keep getting these in my local log whilst backing up:
2023-04-07 20:50:09 Failed to get name of volume with mountpoint 'file:///Volumes/.timemachine/timemachine._smb._tcp.local./E0BD8177-5AD1-4038-A788-F36C4A414062/ulfthomas/', error: Error Domain=NSCocoaErrorDomain Code=257 "The file “<username>” couldn’t be opened because you don’t have permission to view it." UserInfo={NSURL=file:///Volumes/.timemachine/timemachine._smb._tcp.local./E0BD8177-5AD1-4038-A788-F36C4A414062/ulfthomas/, NSFilePath=/Volumes/.timemachine/timemachine._smb._tcp.local./E0BD8177-5AD1-4038-A788-F36C4A414062/<username>, NSUnderlyingError=0x600002dca280 {Error Domain=NSPOSIXErrorDomain Code=13 "Permission denied"}}
The permissions are correctly set on the server and I can mount the sparebundle file without issues (aside from being empty due to no backups having completed).
Any advice?
I doubt it considering time machine should work fine while the machine is idle but it's not going to sleep, is it? Maybe doing a backup with caffeinate -s
running from the terminal to prevent the system from sleeping (only works when plugged in)? Does it happen at the same time? when it disconnects or is there any pattern? Anything like with networking that might be the cause like having a mesh network and having the mac change APs?
As for the errors about permissions, have you tried deleting the sparsebundle from the persistent storage to totally start from scratch, ensuring that something in the sparsebundle isn't off? I'm not exactly sure if there is something that can be done to modify permissions within the sparsebundle but it seems like there is something it doesn't like about the path mentioned in the value of NSURL
.
A valid question, but confirming Mac being awake as I was using it during backup.
I have tried several times to remove the sparsebundle, but as I did not do so now after redoing the multiple user setup I will delete and retry.
I am currently digging around the web and on the Unraid forum I found this post suggesting to add the following to the Samba config (which I assume must be done with in the image):
vfs objects = fruit
fruit:metadata = stream
Deleted the sparebundle and restarted the backup, and the following error reappeared (backup running though):
2023-04-07 21:21:57 Failed to get name of volume with mountpoint 'file:///Volumes/.timemachine/timemachine._smb._tcp.local./3C086BEB-0074-4B00-8B8D-241E264C1D7A/ulfthomas/', error: Error Domain=NSCocoaErrorDomain Code=257 "The file “ulfthomas” couldn’t be opened because you don’t have permission to view it." UserInfo={NSURL=file:///Volumes/.timemachine/timemachine._smb._tcp.local./3C086BEB-0074-4B00-8B8D-241E264C1D7A/ulfthomas/, NSFilePath=/Volumes/.timemachine/timemachine._smb._tcp.local./3C086BEB-0074-4B00-8B8D-241E264C1D7A/ulfthomas, NSUnderlyingError=0x6000036e6520 {Error Domain=NSPOSIXErrorDomain Code=13 "Permission denied"}}
This does seem unrelated as it has to do with the volume name only:
2023-04-07 21:22:57 Failed to create volume info from disk '<TMDisk: 0x13f026000> '/Volumes/.timemachine/timemachine._smb._tcp.local./3C086BEB-0074-4B00-8B8D-241E264C1D7A/ulfthomas'', error: missingName
2023-04-07 21:22:57 Failed to create volume info from disk '<TMDisk: 0x13f018200> '/System/Volumes/Data/home'', error: missingURLForRemounting
Does it happen at the same time? when it disconnects or is there any pattern? Anything like with networking that might be the cause like having a mesh network and having the mac change APs?
I would agree with you but it is not changing ap either. It does happen about the same place and it does not matter if the backup is a fresh one or a restart of one that previously failed. It halts at the same place.
A valid question, but confirming Mac being awake as I was using it during backup.
I have tried several times to remove the sparsebundle, but as I did not do so now after redoing the multiple user setup I will delete and retry.
I am currently digging around the web and on the Unraid forum I found this post suggesting to add the following to the Samba config (which I assume must be done with in the image):
vfs objects = fruit fruit:metadata = stream
Well, after having looked at your smb.conf file I find the only difference to be this line:
vfs objects = acl_xattr fruit streams_xattr
Not sure if it makes a difference.
Yeah, so https://github.com/mbentley/docker-timemachine/issues/69 added acl_xattr
which caused TM to stop working with Big Sur. Besides that, I've mostly used the Samba wiki page on TM as a guide.
I don't recall if I have done any specific research on the fruit:nfs_aces
setting (will have to go back and look) but at some point, I specifically added the SMB_NFS_ACES
env var to allow the user to change it. The TM wiki suggests setting no
but I set yes
as the default. It might be worth trying to set that to no
to see if that helps.
If that doesn't help, I would be curious if setting SMB_INHERIT_PERMISSIONS
to yes
does anything to change any behavior. That's just a total shot in the dark though.
I observe that the backup is failing at approx 61% every time and the following error message is caught on the mac:
2023-04-10 12:16:30 Cancelling backup because volume '/Volumes/.timemachine/timemachine._smb._tcp.local./55616159-271D-4182-ADEA-EB055F22EA9C/<username>' was unmounted.
I have done two things:
no
. Result: same disconnect but at 91% and the following logged on the mac:
2023-04-10 12:41:05 Invalid mountpoint '/Volumes/Backups of Mythbuster' - no volume mounted at this path
2023-04-10 12:41:05 Volume validity check failed for -37, bailing...
2023-04-10 12:41:05 Failed to determine disk image URL for volume '/Volumes/Backups of Mythbuster', error: 3 No such process
2023-04-10 12:41:05 Failed item stats: l:338 bytes p:4 KB c:1, Target Volume Total: 1044326293504, Target Volume Free Space: 46461169664
2023-04-10 12:41:05 Fatal failure to copy '/Volumes/com.apple.TimeMachine.localsnapshots/Backups.backupdb/Mythbuster/2023-04-10-122141/Data/Applications/Slack.app/Contents/Resources/app.asar.unpacked/dist/resources/extensions/react-devtools/icons/16-development.png' to '/Volumes/Backups of Mythbuster/2023-04-10-122144.inprogress/Data/Applications/Slack.app/Contents/Resources/app.asar.unpacked/dist/resources/extensions/react-devtools/icons', error: -37, srcErr: NO
Well, after retry without changing anything else I observed two network disconnects (all my ssh sessions dies, to multiple destinations), but Time Machine was continuing. It did complete the initial backup in fact.
I then retried and again I observed disconnects, but TM finishes again. Very strange.
I will delete the backup and move it back to the raid disks to see if the same holds true towards those disks.
When moved back it fails like it did earlier ... Will retry with inherit set to yes.
No updates since April, closing for now.
Describe the Bug
Expected Behavior
I would expect no disconnect from the network when running Time Machine
Steps to Reproduce
How You're Launching the Container
Container Logs
Additional Context
Unraid has been stable network wise for a long time and is hardwired to the net.