automount units below network mounts result in dep cycles

BtbN commented 4 years ago

systemd version the issue has been seen with

246

This is a regression since https://github.com/systemd/systemd/commit/b3d7aef525dc1620a7948ffdbf3f36bfa3d5b5e8

Used distribution

Gentoo

Linux kernel version used (uname -a)

5.4.77

CPU architecture issue was seen on

amd64

Unexpected behaviour you saw

I have a automounting nfs4 fs in fstab, something like this:

x.x.x.x:/snapshot /home/snapshot nfs4 rw,noatime,nodiratime,sync,clientaddr=x.x.x.z,proto=rdma,port=xxxx,vers=4.2,_netdev,x-systemd.mount-timeout=5s,noauto,x-systemd.automount,x-systemd.idle-timeout=1min 0 0

The generated automount unit is thus depended on by local-fs.target. This generates a lot of nasty dependency cycles, which end up breaking the system startup completely. For example:

Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found ordering cycle on iptables-restore.service/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on basic.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on sockets.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on csync2.socket/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on sysinit.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on systemd-machine-id-commit.service/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on local-fs.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on home-snapshot.automount/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on home.mount/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on network-online.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on network.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Found dependency on network-pre.target/start
Nov 18 16:40:34 login02 systemd[1]: network-pre.target: Job iptables-restore.service/start deleted to break ordering cycle starting with network-pre.target/start

Expected behaviour you didn't see

An automount unit of a network fs should end up in remote-fs.target.

Steps to reproduce the problem

Have a network mounted automounting fs in fstab.

poettering commented 4 years ago

Hmm? the benefit of automount units is that they can be established early, i.e. long before the backing services are available. hence it's generally great to install autofs early, since it means acesses will be properly synchronized. It's kinda the purpose of it: establish them as early as possible, so that only services actually accessing them will have to wait for the backing fs to be mounted, but nothing else.

Now, in your case I guess /home is on the network too?

That's where the problem comes from really. your /home is on the network, and to establish the .autmount we need to mount it.

i guess we should document that .automount units should not be placed on network mounts...

BtbN commented 4 years ago

Isn't the whole point of autofs mounts to mount them on demand, so their primary use is mounting network shares? /home is on nfs, but not an auto-mount, but /home/snapshots is an autofs mount.

How else am I supposed to implement that?

Edit: Hm, what'd happen if autofs mounts would just be added to remote-fs.target instead of local-fs.target? Would that then cause the opposite cycle, when an autofs mount is on top of a local one?

raven-au commented 4 years ago

Isn't the whole point of autofs mounts to mount them on demand, so their primary use is mounting network shares? /home is on nfs, but not an auto-mount, but /home/snapshots is an autofs mount.

Sorry, there's a problem with this from an autofs POV.

systemd provides autofs direct mounts only. In autofs direct mounts must not be nested, remote or otherwise.

I don't think I actually enforce that in autofs but it's certainly not supported (in autofs). If you want this nesting to work then you need to use the autofs milti-mount feature with an indirect mount map. That's the only way such nesting is permitted in autofs.

How else am I supposed to implement that?

More specifically how would systemd handle the implicit dependencies for this nesting?

systemd might want to buy into this but I expect it would become something of nightmarish entanglement of special case dependency handling, I wouldn't recommend it.

raven-au commented 4 years ago

How else am I supposed to implement that?

More specifically how would systemd handle the implicit dependencies for this nesting?

I can tell you that to support this in autofs the code is very complex and more fragile than I would like. It's complicated and has posed a number support problems over the years, some without solutions even now.

raven-au commented 4 years ago

How else am I supposed to implement that?

Setting /home as a static mount might do it, but you would need to ensure /home was always mounted before the automount is started and the dependencies would need to reflect that, not sure they will since /home would be a remote fs.

BtbN commented 4 years ago

/home is a static mount, just via nfs4. Only /home/snapshot is an autofs setup.

This setup has been working fine for almost a year now, up until and including systemd 245.

Those are snapshots of the /home filesystem made on the remote NFS host, for users to access, so they can restore accidentally deleted files on their own without a system administrator connecting to the backing storage and digging in the snapshots there. I cannot mount it non-autofs, because the snapshot being mounted via nfs prevents the automatic snapshot rotation on the storage from rotating it, so it has to be automatically unmounted when not in use.

BtbN commented 4 years ago

I added this patch(https://github.com/BtbN/systemd/commit/5c614ac2d2a0445af75e59045559fd7365e179f1) to my system now as a workaround. It simply makes automount units go after remote-fs instead of local-fs. With this in place, the automount works perfectly fine, and no dependency cycles are generated.

This obviously is not correct in every situation, and a proper fix should be checking the type of the mount unit it directly depends on, and if it's a network mount, add itself to the remote targets, and otherwise to the local ones.

I'm not nesting automounts, that's obviously going to cause a heapload of hard to deal with issues. But I don't see why having an automount below a static network mount would be an issue.

raven-au commented 4 years ago

/home is a static mount, just via nfs4. Only /home/snapshot is an autofs setup.

Right.

This setup has been working fine for almost a year now, up until and including systemd 245.

So it does sound like an automatic dependency problem ...

Those are snapshots of the /home filesystem made on the remote NFS host, for users to access, so they can restore accidentally deleted files on their own without a system administrator connecting to the backing storage and digging in the snapshots there. I cannot mount it non-autofs, because the snapshot being mounted via nfs prevents the automatic snapshot rotation on the storage from rotating it, so it has to be automatically unmounted when not in use.

Perhaps you could define the automount unit and it's dependent mount unit manually and express the dependencies properly in that way. I'm not sure but it seems the only possibility since the change to the automount dependency processing must have been done for a reason in the first place.

BtbN commented 4 years ago

I tried manually writing the automount unit, but that does not solve the issue. The dependency to local-fs.target is not created by the fstab generator after all. It's implicitly added by systemd itself to every automount unit. So with a manually written automount unit, I would run (and in fact did run) into the exact same issue.

raven-au commented 4 years ago

Presumably commit 8f28433 is your latest? On first glance it looks sensible based on your description, in particular I see you are checking the mount unit and only adjusting the dependencies if it is remote. But it might need to check the containing mount unit doesn't have an automount unit associated with it as well to be robust. Were you going to submit a PR or are you waiting to see how the suggestion is received by others (not mine, my comments don't carry weight for systemd development)?

BtbN commented 4 years ago

That commit unfortunately doesn't work. It checks the linked mount unit the automount unit mounts on access. But that is is effectively always going to be a remote fs, since there's rarely a point using autofs for a local mount. It needs to check the parent mount, and I'm not sure how to get access to that.

On top of that, it fails to even do what it's trying to do for some reason. I suspect mount_get_parameters() plain does not work reliably outside of that units own load().

raven-au commented 4 years ago

That commit unfortunately doesn't work. It checks the linked mount unit the automount unit mounts on access. But that is is effectively always going to be a remote fs, since there's rarely a point using autofs for a local mount. It needs to check the parent mount, and I'm not sure how to get access to that.

Ha, yes, you don't know the path to the mount above, this one could be deeper inside the containing mount. But it sounds like there are other problems to deal with first.

On top of that, it fails to even do what it's trying to do for some reason. I suspect mount_get_parameters() plain does not work reliably outside of that units own load().

I see that the commit (which isn't in your published tree btw) is fairly straight forward but even so I'll need to break it up into smaller bits to analyse it ...

raven-au commented 4 years ago

I see that the commit (which isn't in your published tree btw) is fairly straight forward but even so I'll need to break it up into smaller bits to analyse it ...

I don't think that mount_is_network() call makes sense.

I think mount_is_network() is going to check the unit has an fstab entry with that option (but I haven't looked at fstab_test_option() yet) and the generated mount unit won't be in the fstab. It might be sufficient to check the automount unit since that defines whether the trigger target is remote, at least I think that's the point of the _netdev option in those ...

Even doing that check and making that dependency change would need a justification description based on what was done in the commit that introduced the problem describing why it doesn't break what that change was trying to acheive.

It's a bit difficult and a bit more involved really, than just changing the dependencies without that sort of analysis and description.

dbaarda commented 2 years ago

Just thought I'd add my 2c on this since I've been bitten by it in an even simpler form.

In my case /home is an nfs filesystem I wanted to automount. The slightly hairy thing is I have symlinks on my root partition pointing inside /home for things like /usr/local and /var/www (non-distro stuff). This triggers a system hang on boot.

I believe the problem is the home.automount mount is brought up early as part of local-fs.target, and then something in the boot process looks in /home and triggers the mount attempt before the networking is up enough to mount nfs filesystems. I can see that the home.mount does seem to have enough dependencies that it should wait until the networking and remote-fs-pre.target is up, but I'm not sure if the automount trigger honours all that. It's also possible that whatever is triggering the automount is a pre-dependency for the networking, so it's a circular dependency.

The problem with having the automounts for network filesystems enabled so early with local-fs.target is they imply that the mount is ready and working, when they won't actually work until networking is up. So after the automount of /home is up, ls /home will still hang and/or fail until remote-fs-pre.target is up and the nfs mount works.

The systemd-fstab-generator does have support for adding x-systemd.required and x-systemd.required-by dependencies, but these apply to *.mount, not *.automount. Things like the _netdev and nofail mount options also change only the *.mount dependencies. There doesn't seem to be a mechanism for changing/adding dependencies for *.automount. So far my only fix has been to not use automount.

To me it seems logical that automount's of remote _netdev filesystems should themselves be treated as remote filesystems and be part of remote-fs.target, not local-fs.target. However, I can sort of understand that maybe having the automounts "up" earlier could be useful for the "automatically make things wait if the backing device is not yet available" case.

In all of these cases, having some mechanism to override/configure the dependencies of the automount would be useful.

BtbN commented 2 years ago

I don't really understand why you automount home in that scenario. Surely it won't ever be unused and thus unmounted, so it can just be a plain netdev mount?

dbaarda commented 2 years ago

I don't really understand why you automount home in that scenario. Surely it won't ever be unused and thus unmounted, so it can just be a plain netdev mount?

There have been various reasons why I was experimenting with automounting in this case. Mostly I wanted to minimize how much nfs was mounted. I was planning to automount the individual home/* subdirs at one point but started with mounting the whole lot to test and ended up having to emergency recover the normally headless and inaccessible server which scared me off. Much later I was playing with UPS shutdown sequencing of my server and NAS and again felt the need automate/minimize the nfs mounts and tried again, only to have to do the emergency-recovery dance to remind me why I didn't do that last time.

In my first fiddling with this I had problems with the bootup sequence starting http servers before /home/www was mounted, which was also tangled up with my automount experiment pain. I eventually found using the bg mount option on a plain default nfs mount worked and left it at that, but I never understood exactly why just using automount had such a catastrophic result. In my second attempt for the UPS shutdown tests I dug in a bit deeper and stumbled on this. It now looks like bg is actually delaying the nfs mount too much, and that my http server starts after mounting /home is a fortunate race-condition result (http server startup depends on remote-fs.target, but bg means remote-fs.target doesn't depend on home.mount). I suspect I can just remove bg now (perhaps something else in the boot sequence has changed/fixed?), but I'm a bit scared to change anything because a hang will mean another emergency recovery.

Mostly I was shocked repeatedly how fragile automounts were at causing boot hangs that required emergency recoveries. I naively thought "automount will just mount it when/if I need it... cool", and never expect it to hang the whole bootup sequence. I feel like moving the automount of remote filesystems from local-fs.target to remote-fs.target would avoid this fragility with little/no cost. That just adding x-systemd.automount to an nfs mount moved it to earlier in the boot sequence was an unexpected and unpleasant surprise. That there is no fstab mount options to workaround this (ie, it would be nice if _netdev and maybe nofail applied to the automount, not just the mount) is also a bit frustrating.

systemd / systemd

automount units below network mounts result in dep cycles #17657