When trying to benchmark some things on 32-bit OpenZFS, I discovered that attempting zfs recv of a 1M recordsize dataset on a 32-bit VM very rapidly nearly hung (it makes about 100 MB of progress in a second or two every minute or so), and the callstack looked like:
Being a 32-bit VM with the default splith, we get 1 GB of kernel RAM, which ZFS calculates it can use up to 372922368 bytes of (half the 745844736 it calculates allmem to be).
Meanwhile, arcstat reports using 5.6 MB of RAM with a target of 50 and 599M available.
I'm wildly guessing from the lines above that we're requesting 1M of contiguous RAM, finding it hard to come by, and then waiting permanently in the "emergency" allocation path, with the progress being whenever the emergency path timeout deigns to allow progress.
Describe how to reproduce the problem
recv a recordsize 1M dataset on 32-bit x86 OpenZFS, no flags needed
bang
Include any warning/errors/backtraces from the system logs
As above.
I hacked it up to grossly shorten the timeout on i386 just to make progress, but then it hangs semi-indefinitely on attempting to receive with zstd enabled with a /proc/foo/stack on the recv of:
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
System information
Describe the problem you're observing
When trying to benchmark some things on 32-bit OpenZFS, I discovered that attempting zfs recv of a 1M recordsize dataset on a 32-bit VM very rapidly nearly hung (it makes about 100 MB of progress in a second or two every minute or so), and the callstack looked like:
Which maps to line 1078 here: https://github.com/openzfs/zfs/blob/c70bb2f610523f9791796cedf6b0d5af1925131e/module/os/linux/spl/spl-kmem-cache.c#L1065-L1082
Being a 32-bit VM with the default splith, we get 1 GB of kernel RAM, which ZFS calculates it can use up to 372922368 bytes of (half the 745844736 it calculates allmem to be).
Meanwhile, arcstat reports using 5.6 MB of RAM with a target of 50 and 599M available.
I'm wildly guessing from the lines above that we're requesting 1M of contiguous RAM, finding it hard to come by, and then waiting permanently in the "emergency" allocation path, with the progress being whenever the emergency path timeout deigns to allow progress.
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
As above.
I hacked it up to grossly shorten the timeout on i386 just to make progress, but then it hangs semi-indefinitely on attempting to receive with zstd enabled with a /proc/foo/stack on the recv of:
and complaints about receive_writer blocking forever in dmesg with:
So more work than just that hack job required, I suppose.