Open rincebrain opened 1 year ago
So, this happens without encryption too, it just happens much faster there.
It seems like the reason is there's no equivalent of the psize > SPA_MIN_BLOCKSIZE
limiter from zio.c in the code triggering from metaslab_force_ganging
, so it can simulate failing on things too small to gang, and then the gang allocation doesn't trigger since there is a conditional around that, it just fails.
Simply adding that would make the behavior better, but not quite right, I think, since I think the actual goal would be to not trigger on allocations smaller than the smallest possible on any eligible vdevs (e.g. if we have an ashift 13 vdev and an ashift 12 special, we probably don't want to try ganging below ashift 12 at least, and ideally 13 if we have enough information and know 12 isn't an eligible device?). I don't think it's harmful, just wasted effort, though, in the zio.c case, unlike here, where it can cause allocation failures spuriously. :/
(You could argue this counts as holding it wrong, but I don't think a tunable intended to trigger this codepath should be able to trigger failure to allocate at all in cases where the non-forced code never could...)
In any event, I'll probably cut a PR shortly to see if people object to at least limiting it to > SPA_MINBLOCKSIZE, since the test suite currently only uses the tunable to force allocating on 16k or larger records, so it won't break any usage there...
looking at this, would it be better to hold off upgrading to 2.1.8 from 2.1.6?
This isn't new, it happens on 2.0.0 too, and is only for a debug tunable. So I wouldn't be afraid of this.
System information
Describe the problem you're observing
While trying to reproduce #14413, I tried setting this in lieu of forcibly fragmenting the pool, and success, found that the pool almost immediately suspends after doing this with an encrypted dataset present.
Go back to 2.1.7 to confirm it's no longer an issue per the report of which commit to blame in there, and...it still suspends.
Uh-oh.
Describe how to reproduce the problem
("somefile" below is a 100G sparse file; works fine with actual disks too)
And even if you unset the tunable and try zpool clear, it will not, in my experience, come back.
Include any warning/errors/backtraces from the system logs
dmesg now has a few lovely