Open BtbN opened 4 years ago
Hmm? the benefit of automount units is that they can be established early, i.e. long before the backing services are available. hence it's generally great to install autofs early, since it means acesses will be properly synchronized. It's kinda the purpose of it: establish them as early as possible, so that only services actually accessing them will have to wait for the backing fs to be mounted, but nothing else.
Now, in your case I guess /home is on the network too?
That's where the problem comes from really. your /home is on the network, and to establish the .autmount we need to mount it.
i guess we should document that .automount units should not be placed on network mounts...
Isn't the whole point of autofs mounts to mount them on demand, so their primary use is mounting network shares? /home is on nfs, but not an auto-mount, but /home/snapshots is an autofs mount.
How else am I supposed to implement that?
Edit: Hm, what'd happen if autofs mounts would just be added to remote-fs.target instead of local-fs.target? Would that then cause the opposite cycle, when an autofs mount is on top of a local one?
Isn't the whole point of autofs mounts to mount them on demand, so their primary use is mounting network shares? /home is on nfs, but not an auto-mount, but /home/snapshots is an autofs mount.
Sorry, there's a problem with this from an autofs POV.
systemd provides autofs direct mounts only. In autofs direct mounts must not be nested, remote or otherwise.
I don't think I actually enforce that in autofs but it's certainly not supported (in autofs). If you want this nesting to work then you need to use the autofs milti-mount feature with an indirect mount map. That's the only way such nesting is permitted in autofs.
How else am I supposed to implement that?
More specifically how would systemd handle the implicit dependencies for this nesting?
systemd might want to buy into this but I expect it would become something of nightmarish entanglement of special case dependency handling, I wouldn't recommend it.
How else am I supposed to implement that?
More specifically how would systemd handle the implicit dependencies for this nesting?
I can tell you that to support this in autofs the code is very complex and more fragile than I would like. It's complicated and has posed a number support problems over the years, some without solutions even now.
How else am I supposed to implement that?
Setting /home as a static mount might do it, but you would need to ensure /home was always mounted before the automount is started and the dependencies would need to reflect that, not sure they will since /home would be a remote fs.
/home is a static mount, just via nfs4. Only /home/snapshot is an autofs setup.
This setup has been working fine for almost a year now, up until and including systemd 245.
Those are snapshots of the /home filesystem made on the remote NFS host, for users to access, so they can restore accidentally deleted files on their own without a system administrator connecting to the backing storage and digging in the snapshots there. I cannot mount it non-autofs, because the snapshot being mounted via nfs prevents the automatic snapshot rotation on the storage from rotating it, so it has to be automatically unmounted when not in use.
I added this patch(https://github.com/BtbN/systemd/commit/5c614ac2d2a0445af75e59045559fd7365e179f1) to my system now as a workaround. It simply makes automount units go after remote-fs instead of local-fs. With this in place, the automount works perfectly fine, and no dependency cycles are generated.
This obviously is not correct in every situation, and a proper fix should be checking the type of the mount unit it directly depends on, and if it's a network mount, add itself to the remote targets, and otherwise to the local ones.
I'm not nesting automounts, that's obviously going to cause a heapload of hard to deal with issues. But I don't see why having an automount below a static network mount would be an issue.
/home is a static mount, just via nfs4. Only /home/snapshot is an autofs setup.
Right.
This setup has been working fine for almost a year now, up until and including systemd 245.
So it does sound like an automatic dependency problem ...
Those are snapshots of the /home filesystem made on the remote NFS host, for users to access, so they can restore accidentally deleted files on their own without a system administrator connecting to the backing storage and digging in the snapshots there. I cannot mount it non-autofs, because the snapshot being mounted via nfs prevents the automatic snapshot rotation on the storage from rotating it, so it has to be automatically unmounted when not in use.
Perhaps you could define the automount unit and it's dependent mount unit manually and express the dependencies properly in that way. I'm not sure but it seems the only possibility since the change to the automount dependency processing must have been done for a reason in the first place.
I tried manually writing the automount unit, but that does not solve the issue. The dependency to local-fs.target is not created by the fstab generator after all. It's implicitly added by systemd itself to every automount unit. So with a manually written automount unit, I would run (and in fact did run) into the exact same issue.
Presumably commit 8f28433 is your latest? On first glance it looks sensible based on your description, in particular I see you are checking the mount unit and only adjusting the dependencies if it is remote. But it might need to check the containing mount unit doesn't have an automount unit associated with it as well to be robust. Were you going to submit a PR or are you waiting to see how the suggestion is received by others (not mine, my comments don't carry weight for systemd development)?
That commit unfortunately doesn't work. It checks the linked mount unit the automount unit mounts on access. But that is is effectively always going to be a remote fs, since there's rarely a point using autofs for a local mount. It needs to check the parent mount, and I'm not sure how to get access to that.
On top of that, it fails to even do what it's trying to do for some reason. I suspect mount_get_parameters() plain does not work reliably outside of that units own load().
That commit unfortunately doesn't work. It checks the linked mount unit the automount unit mounts on access. But that is is effectively always going to be a remote fs, since there's rarely a point using autofs for a local mount. It needs to check the parent mount, and I'm not sure how to get access to that.
Ha, yes, you don't know the path to the mount above, this one could be deeper inside the containing mount. But it sounds like there are other problems to deal with first.
On top of that, it fails to even do what it's trying to do for some reason. I suspect mount_get_parameters() plain does not work reliably outside of that units own load().
I see that the commit (which isn't in your published tree btw) is fairly straight forward but even so I'll need to break it up into smaller bits to analyse it ...
I see that the commit (which isn't in your published tree btw) is fairly straight forward but even so I'll need to break it up into smaller bits to analyse it ...
I don't think that mount_is_network() call makes sense.
I think mount_is_network() is going to check the unit has an fstab entry with that option (but I haven't looked at fstab_test_option() yet) and the generated mount unit won't be in the fstab. It might be sufficient to check the automount unit since that defines whether the trigger target is remote, at least I think that's the point of the _netdev option in those ...
Even doing that check and making that dependency change would need a justification description based on what was done in the commit that introduced the problem describing why it doesn't break what that change was trying to acheive.
It's a bit difficult and a bit more involved really, than just changing the dependencies without that sort of analysis and description.
Just thought I'd add my 2c on this since I've been bitten by it in an even simpler form.
In my case /home
is an nfs filesystem I wanted to automount. The slightly hairy thing is I have symlinks on my root partition pointing inside /home
for things like /usr/local
and /var/www
(non-distro stuff). This triggers a system hang on boot.
I believe the problem is the home.automount
mount is brought up early as part of local-fs.target
, and then something in the boot process looks in /home
and triggers the mount attempt before the networking is up enough to mount nfs filesystems. I can see that the home.mount
does seem to have enough dependencies that it should wait until the networking and remote-fs-pre.target
is up, but I'm not sure if the automount trigger honours all that. It's also possible that whatever is triggering the automount is a pre-dependency for the networking, so it's a circular dependency.
The problem with having the automounts for network filesystems enabled so early with local-fs.target
is they imply that the mount is ready and working, when they won't actually work until networking is up. So after the automount of /home
is up, ls /home
will still hang and/or fail until remote-fs-pre.target
is up and the nfs mount works.
The systemd-fstab-generator
does have support for adding x-systemd.required
and x-systemd.required-by
dependencies, but these apply to *.mount
, not *.automount
. Things like the _netdev
and nofail
mount options also change only the *.mount
dependencies. There doesn't seem to be a mechanism for changing/adding dependencies for *.automount
. So far my only fix has been to not use automount.
To me it seems logical that automount's of remote _netdev
filesystems should themselves be treated as remote filesystems and be part of remote-fs.target
, not local-fs.target
. However, I can sort of understand that maybe having the automounts "up" earlier could be useful for the "automatically make things wait if the backing device is not yet available" case.
In all of these cases, having some mechanism to override/configure the dependencies of the automount would be useful.
I don't really understand why you automount home in that scenario. Surely it won't ever be unused and thus unmounted, so it can just be a plain netdev mount?
I don't really understand why you automount home in that scenario. Surely it won't ever be unused and thus unmounted, so it can just be a plain netdev mount?
There have been various reasons why I was experimenting with automounting in this case. Mostly I wanted to minimize how much nfs was mounted. I was planning to automount the individual home/*
subdirs at one point but started with mounting the whole lot to test and ended up having to emergency recover the normally headless and inaccessible server which scared me off. Much later I was playing with UPS shutdown sequencing of my server and NAS and again felt the need automate/minimize the nfs mounts and tried again, only to have to do the emergency-recovery dance to remind me why I didn't do that last time.
In my first fiddling with this I had problems with the bootup sequence starting http servers before /home/www
was mounted, which was also tangled up with my automount experiment pain. I eventually found using the bg
mount option on a plain default nfs mount worked and left it at that, but I never understood exactly why just using automount had such a catastrophic result. In my second attempt for the UPS shutdown tests I dug in a bit deeper and stumbled on this. It now looks like bg
is actually delaying the nfs mount too much, and that my http server starts after mounting /home is a fortunate race-condition result (http server startup depends on remote-fs.target
, but bg
means remote-fs.target
doesn't depend on home.mount
). I suspect I can just remove bg
now (perhaps something else in the boot sequence has changed/fixed?), but I'm a bit scared to change anything because a hang will mean another emergency recovery.
Mostly I was shocked repeatedly how fragile automounts were at causing boot hangs that required emergency recoveries. I naively thought "automount will just mount it when/if I need it... cool", and never expect it to hang the whole bootup sequence. I feel like moving the automount of remote filesystems from local-fs.target
to remote-fs.target
would avoid this fragility with little/no cost. That just adding x-systemd.automount
to an nfs mount moved it to earlier in the boot sequence was an unexpected and unpleasant surprise. That there is no fstab
mount options to workaround this (ie, it would be nice if _netdev
and maybe nofail
applied to the automount, not just the mount) is also a bit frustrating.
systemd version the issue has been seen with
This is a regression since https://github.com/systemd/systemd/commit/b3d7aef525dc1620a7948ffdbf3f36bfa3d5b5e8
Used distribution
Linux kernel version used (
uname -a
)CPU architecture issue was seen on
Unexpected behaviour you saw
I have a automounting nfs4 fs in fstab, something like this:
x.x.x.x:/snapshot /home/snapshot nfs4 rw,noatime,nodiratime,sync,clientaddr=x.x.x.z,proto=rdma,port=xxxx,vers=4.2,_netdev,x-systemd.mount-timeout=5s,noauto,x-systemd.automount,x-systemd.idle-timeout=1min 0 0
The generated automount unit is thus depended on by local-fs.target. This generates a lot of nasty dependency cycles, which end up breaking the system startup completely. For example:
Expected behaviour you didn't see
An automount unit of a network fs should end up in remote-fs.target.
Steps to reproduce the problem
Have a network mounted automounting fs in fstab.