Closed omarkilani closed 3 years ago
How large is the pool (used/total)? How much RAM is the ARC configured to use, out of how much on the VM in total? How fast is the storage? OOMing that fast implies either almost no RAM or reasonably quick storage (or both), assuming it's the ARC filling that's exerting all that memory pressure. If you want to get the VM running again, the things I'd try would be:
zpool scrub -p
or -s
as desired.options zfs zfs_no_scrub_io=1
to make scrub not actually scrub (though if just the metadata crawl is enough to exhaust things, that won't save you)while true; do zpool scrub -p [thatpool] && break; done;
in /etc/rc.local, though be sure to remove it before rebooting, as zpool scrub -p [pool]
on a pool without a scrub running is also an error. (I just put the while true
because I'm not sure what guarantees there are about when rc.local gets run relative to the pool import; if you know better than I that it gets run strictly later, you can drop it.)Hmmm... so I tried this with 2.0.4, and a similar thing happens:
[root@instance-20210526-1929 ~]# zpool import tank
[root@instance-20210526-1929 ~]# zpool status
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 15:01:04 2021
21.5G scanned at 1.79G/s, 620K issued at 51.7K/s, 118G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
errors: No known data errors
[root@instance-20210526-1929 ~]# zpool status
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 15:01:04 2021
26.3G scanned at 1.76G/s, 752K issued at 50.1K/s, 118G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
errors: No known data errors
[root@instance-20210526-1929 ~]# zpool status
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 15:01:04 2021
29.2G scanned at 1.83G/s, 752K issued at 47K/s, 118G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
errors: No known data errors
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 56831 40247 27 419 32774
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 60724 36354 27 419 28882
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 65738 31340 27 420 23867
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 69242 27836 27 420 20364
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 73679 23399 27 420 15926
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 78099 18979 27 420 11507
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -m
total used free shared buff/cache available
Mem: 97499 82966 14112 27 420 6639
Swap: 8191 0 8191
[root@instance-20210526-1929 ~]# free -mclient_loop: send disconnect: Broken pipe
Hey @rincebrain,
I was able to just zpool import tank && zpool scrub -s tank and then I recreated the tank. All good.
We're just running some tests on aarch64 so it's fine if the VM dies or locks up.
So, the pool looks like:
[root@instance-20210526-1929 ~]# zpool list -v
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
tank 5.81T 4.07G 5.81T - - 0% 0% 1.00x ONLINE -
scsi-360f39ea51229408cb368509d91495fb9 496G 351M 496G - - 0% 0.06% - ONLINE
scsi-3603528d43ade4b31b70186f9a041601e 496G 347M 496G - - 0% 0.06% - ONLINE
scsi-36007099c456f4ec780fdc03b14976f19 496G 354M 496G - - 0% 0.06% - ONLINE
scsi-360d5b4cb98a44fabbcc67b1a55808124 496G 353M 496G - - 0% 0.06% - ONLINE
scsi-3603ff370fa044673a5c09353568c6757 496G 349M 496G - - 0% 0.06% - ONLINE
scsi-360ba05ab3eab4897bcf042fdfc3da1eb 496G 347M 496G - - 0% 0.06% - ONLINE
scsi-360087adf642b4f6586326dada6c8eb41 496G 341M 496G - - 0% 0.06% - ONLINE
scsi-3603a47cd86dd484bba1b05bab36c1257 496G 339M 496G - - 0% 0.06% - ONLINE
scsi-3600bf0330c6e4139829ad72c816b8c06 496G 343M 496G - - 0% 0.06% - ONLINE
scsi-3605635d6a27b4c189c0af523ddc262de 496G 345M 496G - - 0% 0.06% - ONLINE
scsi-36013bd4eeaab4b4a9e88beb0474a2439 496G 346M 496G - - 0% 0.06% - ONLINE
scsi-360f024a7d8b64521b7e7d671d9397ab5 496G 349M 496G - - 0% 0.06% - ONLINE
It has an aggregate read bandwidth of 3.6Gb/s. The machine has 96Gb of RAM. I haven't done any tuning or modified any settings at all, just loaded the module.
I reran scrub on the newly created pool and it also ran out of memory, so I'll see what I can tune.
[ deleted a post because the issue wasn't solved, it just looked like it due to a smaller allocated size on the new pool ]
I'm pretty sure you shouldn't actually close this - IMO "scrub triggers the OOM killer even immediately at boot unless you tweak zfs_arc_max" sounds like a bug even if it's got a mitigation, as it's supposed to default to 50% of total system RAM on Linux, which should be well clear of that threshold on a system with 96 GiB.
@rincebrain yeah, I mean that's what I personally think. Maybe it's just the speed of the newer devices or maybe something isn't optimal on aarch64 or 🤷♂️.
I'll leave it open then.
I'm running this on Linux/aarch64 on my RPi4 with 8GiB of RAM and it's been up for a month, including a scrub (admittedly only one spinning disk, though, so if it's a race between pruning the ARC and filling it, I wouldn't be hitting it).
Is your VM being run on an aarch64 machine in turn, or some x86_64 or other arch? (I'm wondering about the feasibility of triggering this without having some beefy aarch64 hardware available, though I suppose at least one cloud provider will sell you aarch64 VMs...)
@rincebrain it's running it on a Ampere Alta host machine, which you can test out for free:
https://www.servethehome.com/oracle-cloud-giving-away-ampere-arm-a1-instances-always-free/
The way to get the better quota (16 cores and 96GiB of RAM) is to sign up for the Arm Accelerator:
https://go.oracle.com/armaccelerator
Which is made for OSS developers etc.
Note that I'm running it with RHEL but Oracle Linux is equivalent and you can install the same kernel on there.
I just had it die with the zfs_arc_max set to 24GiB so let me paste that in a new reply.
Alright, so, it seems like this only happens if the allocated size on the zpool > available RAM, even when zfs_arc_max is set low and nothing else is running on the machine.
[root@instance-20210526-1929 ~]# cat /sys/module/zfs/parameters/zfs_arc_max
25769803776
[root@instance-20210526-1929 ~]# zpool scrub tank
zpool statu[root@instance-20210526-1929 ~]# zpool status 1
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
4.49G scanned at 4.49G/s, 312K issued at 312K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
8.08G scanned at 4.04G/s, 312K issued at 156K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
11.2G scanned at 3.72G/s, 312K issued at 104K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
14.1G scanned at 3.53G/s, 312K issued at 78K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
16.8G scanned at 3.36G/s, 312K issued at 62.4K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
19.0G scanned at 3.16G/s, 336K issued at 56K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
21.5G scanned at 2.69G/s, 336K issued at 42K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
24.3G scanned at 2.70G/s, 336K issued at 37.3K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
27.2G scanned at 2.72G/s, 336K issued at 33.6K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
30.0G scanned at 2.73G/s, 336K issued at 30.5K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
33.5G scanned at 2.79G/s, 336K issued at 28K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
36.9G scanned at 2.84G/s, 336K issued at 25.8K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
41.2G scanned at 2.94G/s, 336K issued at 24K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
45.1G scanned at 3.01G/s, 336K issued at 22.4K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
49.1G scanned at 3.07G/s, 336K issued at 21K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
52.6G scanned at 3.09G/s, 348K issued at 20.5K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
56.0G scanned at 3.11G/s, 348K issued at 19.3K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
59.8G scanned at 3.15G/s, 348K issued at 18.3K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
63.8G scanned at 3.19G/s, 348K issued at 17.4K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
67.3G scanned at 3.20G/s, 348K issued at 16.6K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
70.9G scanned at 3.22G/s, 372K issued at 16.9K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:12:47 2021
74.8G scanned at 3.25G/s, 372K issued at 16.2K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 85643 11618 14 236 4062
Swap: 8191 209 7982
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 89486 7772 14 240 218
Swap: 8191 209 7982
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 88614 8666 14 217 1100
Swap: 8191 225 7966
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 86123 11161 14 214 3594
Swap: 8191 225 7966
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 85272 12012 14 214 4445
Swap: 8191 225 7966
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 89053 8232 14 214 664
Swap: 8191 225 7966
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 88280 9019 14 199 1447
Swap: 8191 237 7954
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 87278 10014 14 205 2443
Swap: 8191 237 7954
root@instance-20210526-1929 ~/fio # free -m
total used free shared buff/cache available
Mem: 97499 90669 6623 14 206 637
Swap: 8191 237 7954
root@instance-20210526-1929 ~/fio # client_loop: send disconnect: Broken pipe
It looked like it was attempting to keep the memory usage under control at least but I think it's just too fast, or something.
What I did was use fio to create a file big enough to trigger the issue:
[root@instance-20210526-1929 fio]# cat seqread_64.fio
[global]
bs=64K
iodepth=64
direct=1
ioengine=libaio
group_reporting
time_based
runtime=120
numjobs=4
name=raw-read
rw=read
[job1]
filename=/tank/db/f.fio
size=128G
And the zpool/zfs parameters:
[root@instance-20210526-1929 ~]# zpool get all
NAME PROPERTY VALUE SOURCE
tank size 5.81T -
tank capacity 2% -
tank altroot - default
tank health ONLINE -
tank guid 7232894876644065540 -
tank version - default
tank bootfs - default
tank delegation on default
tank autoreplace off default
tank cachefile - default
tank failmode wait default
tank listsnapshots off default
tank autoexpand off default
tank dedupratio 1.00x -
tank free 5.68T -
tank allocated 136G -
tank readonly off -
tank ashift 12 local
tank comment - default
tank expandsize - -
tank freeing 0 -
tank fragmentation 0% -
tank leaked 0 -
tank multihost off default
tank checkpoint - -
tank load_guid 12888677109862668849 -
tank autotrim off default
tank feature@async_destroy enabled local
tank feature@empty_bpobj active local
tank feature@lz4_compress active local
tank feature@multi_vdev_crash_dump enabled local
tank feature@spacemap_histogram active local
tank feature@enabled_txg active local
tank feature@hole_birth active local
tank feature@extensible_dataset active local
tank feature@embedded_data active local
tank feature@bookmarks enabled local
tank feature@filesystem_limits enabled local
tank feature@large_blocks enabled local
tank feature@large_dnode enabled local
tank feature@sha512 enabled local
tank feature@skein enabled local
tank feature@edonr enabled local
tank feature@userobj_accounting active local
tank feature@encryption enabled local
tank feature@project_quota active local
tank feature@device_removal enabled local
tank feature@obsolete_counts enabled local
tank feature@zpool_checkpoint enabled local
tank feature@spacemap_v2 active local
tank feature@allocation_classes enabled local
tank feature@resilver_defer enabled local
tank feature@bookmark_v2 enabled local
tank feature@redaction_bookmarks enabled local
tank feature@redacted_datasets enabled local
tank feature@bookmark_written enabled local
tank feature@log_spacemap active local
tank feature@livelist enabled local
tank feature@device_rebuild enabled local
tank feature@zstd_compress enabled local
[root@instance-20210526-1929 ~]# zfs get all
NAME PROPERTY VALUE SOURCE
tank type filesystem -
tank creation Fri May 28 16:07 2021 -
tank used 136G -
tank available 5.50T -
tank referenced 96K -
tank compressratio 1.03x -
tank mounted yes -
tank quota none default
tank reservation none default
tank recordsize 128K default
tank mountpoint /tank default
tank sharenfs off default
tank checksum on default
tank compression off default
tank atime on default
tank devices on default
tank exec on default
tank setuid on default
tank readonly off default
tank zoned off default
tank snapdir hidden default
tank aclmode discard default
tank aclinherit restricted default
tank createtxg 1 -
tank canmount on default
tank xattr on default
tank copies 1 default
tank version 5 -
tank utf8only off -
tank normalization none -
tank casesensitivity sensitive -
tank vscan off default
tank nbmand off default
tank sharesmb off default
tank refquota none default
tank refreservation none default
tank guid 9019197745549167109 -
tank primarycache all default
tank secondarycache all default
tank usedbysnapshots 0B -
tank usedbydataset 96K -
tank usedbychildren 136G -
tank usedbyrefreservation 0B -
tank logbias latency default
tank objsetid 54 -
tank dedup off default
tank mlslabel none default
tank sync standard default
tank dnodesize legacy default
tank refcompressratio 1.00x -
tank written 96K -
tank logicalused 140G -
tank logicalreferenced 42K -
tank volmode default default
tank filesystem_limit none default
tank snapshot_limit none default
tank filesystem_count none default
tank snapshot_count none default
tank snapdev hidden default
tank acltype off default
tank context none default
tank fscontext none default
tank defcontext none default
tank rootcontext none default
tank relatime off default
tank redundant_metadata all default
tank overlay on default
tank encryption off default
tank keylocation none default
tank keyformat none default
tank pbkdf2iters 0 default
tank special_small_blocks 0 default
tank/db type filesystem -
tank/db creation Fri May 28 16:08 2021 -
tank/db used 136G -
tank/db available 5.50T -
tank/db referenced 136G -
tank/db compressratio 1.03x -
tank/db mounted yes -
tank/db quota none default
tank/db reservation none default
tank/db recordsize 8K local
tank/db mountpoint /tank/db default
tank/db sharenfs off default
tank/db checksum on default
tank/db compression lz4 local
tank/db atime off local
tank/db devices on default
tank/db exec on default
tank/db setuid on default
tank/db readonly off default
tank/db zoned off default
tank/db snapdir hidden default
tank/db aclmode discard default
tank/db aclinherit restricted default
tank/db createtxg 19 -
tank/db canmount on default
tank/db xattr sa local
tank/db copies 1 default
tank/db version 5 -
tank/db utf8only off -
tank/db normalization none -
tank/db casesensitivity sensitive -
tank/db vscan off default
tank/db nbmand off default
tank/db sharesmb off default
tank/db refquota none default
tank/db refreservation none default
tank/db guid 3231454792195716646 -
tank/db primarycache all default
tank/db secondarycache all default
tank/db usedbysnapshots 0B -
tank/db usedbydataset 136G -
tank/db usedbychildren 0B -
tank/db usedbyrefreservation 0B -
tank/db logbias throughput local
tank/db objsetid 899 -
tank/db dedup off default
tank/db mlslabel none default
tank/db sync standard default
tank/db dnodesize legacy default
tank/db refcompressratio 1.03x -
tank/db written 136G -
tank/db logicalused 140G -
tank/db logicalreferenced 140G -
tank/db volmode default default
tank/db filesystem_limit none default
tank/db snapshot_limit none default
tank/db filesystem_count none default
tank/db snapshot_count none default
tank/db snapdev hidden default
tank/db acltype off default
tank/db context none default
tank/db fscontext none default
tank/db defcontext none default
tank/db rootcontext none default
tank/db relatime off default
tank/db redundant_metadata all default
tank/db overlay on default
tank/db encryption off default
tank/db keylocation none default
tank/db keyformat none default
tank/db pbkdf2iters 0 default
tank/db special_small_blocks 0 default
I tried setting zfs_no_scrub_prefetch to 1 but it just slowed down the scrub to 2.28Gb/s with the same issue.
The thing is... the 'used' output of 'free -m' matches the progress output of zpool status. So just before it dies:
85.5G scanned at 2.25G/s, 320K issued at 8.42K/s, 136G total
And the last 3 calls to free on a while 1 / free -m / sleep 1 loop:
total used free shared buff/cache available
Mem: 97499 85499 11580 27 420 4108
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 97499 87691 9387 27 420 1915
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 97499 90111 6967 27 420 1080
Swap: 8191 0 8191
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
Could you check the contents of /proc/spl/kmem/slab
it should show how the memory is being used. It sounds like for some reason the scrub scan logic is not respecting its memory limits.
@behlendorf yup.
Okay, so...
zpool import tank && zpool status 1
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 18:24:44 2021
83.3G scanned at 3.33G/s, 156K issued at 6.24K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
...
errors: No known data errors
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
And I wrote a little script that does a diff -u on the output on kmem/slab after a second:
--- slab.last.txt 2021-05-28 18:40:29.020378409 +0000
+++ slab.new.txt 2021-05-28 18:40:30.030382972 +0000
@@ -341,8 +341,8 @@
dmu_buf_impl_t 0x00080 587520 519168 3456 384 170 169 169 1360 1352 1352 0 0 0
zil_lwb_cache 0x00080 0 0 3392 376 0 0 0 0 0 0 0 0 0
zil_zcw_cache 0x00080 0 0 1600 152 0 0 0 0 0 0 0 0 0
-sio_cache_0 0x08080 1647101760 1217423040 1472 136 1118955 1118955 1118955 8951640 8951640 8951640 0 0 0
-sio_cache_1 0x00080 3558400 2703168 1600 152 2224 2223 2223 17792 17784 17784 0 0 0
+sio_cache_0 0x08080 1733533184 1281307136 1472 136 1177672 1177672 1177672 9421376 9421376 9421376 0 0 0
+sio_cache_1 0x00080 3742400 2843008 1600 152 2339 2338 2338 18712 18704 18704 0 0 0
sio_cache_2 0x00080 307584 237888 1728 168 178 177 177 1424 1416 1416 0 0 0
zfs_znode_cache 0x00100 - 5720 - 1144 - - - - 5 - - - -
zfs_znode_hold_cache 0x00080 2176 704 1088 88 2 1 1 16 8 8 0 0 0
--- slab.last.txt 2021-05-28 18:40:30.030382972 +0000
+++ slab.new.txt 2021-05-28 18:40:31.030387489 +0000
@@ -341,8 +341,8 @@
dmu_buf_impl_t 0x00080 587520 519168 3456 384 170 169 169 1360 1352 1352 0 0 0
zil_lwb_cache 0x00080 0 0 3392 376 0 0 0 0 0 0 0 0 0
zil_zcw_cache 0x00080 0 0 1600 152 0 0 0 0 0 0 0 0 0
-sio_cache_0 0x08080 1733698048 1281428992 1472 136 1177784 1177784 1177784 9422272 9422272 9422272 0 0 0
-sio_cache_1 0x00080 3742400 2843008 1600 152 2339 2338 2338 18712 18704 18704 0 0 0
+sio_cache_0 0x08080 1819677568 1344979072 1472 136 1236194 1236194 1236194 9889552 9889552 9889552 0 0 0
+sio_cache_1 0x00080 3924800 2981632 1600 152 2453 2452 2452 19624 19616 19616 0 0 0
sio_cache_2 0x00080 307584 237888 1728 168 178 177 177 1424 1416 1416 0 0 0
zfs_znode_cache 0x00100 - 5720 - 1144 - - - - 5 - - - -
zfs_znode_hold_cache 0x00080 2176 704 1088 88 2 1 1 16 8 8 0 0 0
--- slab.last.txt 2021-05-28 18:40:31.030387489 +0000
+++ slab.new.txt 2021-05-28 18:40:32.030392007 +0000
@@ -5,11 +5,11 @@
kcf_sreq_cache 0x00080 0 0 2112 160 0 0 0 0 0 0 0 0 0
kcf_areq_cache 0x00080 0 0 4672 464 0 0 0 0 0 0 0 0 0
kcf_context_cache 0x00080 0 0 2112 152 0 0 0 0 0 0 0 0 0
-zfs_btree_leaf_cache 0x00080 4939648 4849664 33152 4096 149 148 148 1192 1184 1184 0 0 0
+zfs_btree_leaf_cache 0x00080 5039104 4947968 33152 4096 152 151 151 1216 1208 1208 0 0 0
ddt_cache 0x00080 996160 795392 199232 24856 5 4 4 40 32 32 0 0 0
ddt_entry_cache 0x00080 0 0 3968 448 0 0 0 0 0 0 0 0 0
-zio_cache 0x00080 7319936 3000320 10624 1280 689 689 757 5512 2344 6056 0 0 0
-zio_link_cache 0x00080 521472 112128 768 48 679 679 772 5432 2336 6176 0 0 0
+zio_cache 0x00080 7171200 2836480 10624 1280 675 675 757 5400 2216 6056 0 0 0
+zio_link_cache 0x00080 513024 105984 768 48 668 668 772 5344 2208 6176 0 0 0
zio_buf_512 0x00082 147968 65536 8704 512 17 16 16 136 128 128 0 0 0
zio_data_buf_512 0x00082 0 0 8704 512 0 0 0 0 0 0 0 0 0
zio_buf_1024 0x00082 0 0 12800 1024 0 0 0 0 0 0 0 0 0
@@ -331,18 +331,18 @@
zio_buf_16777216 0x00082 0 0 16908288 16777216 0 0 0 0 0 0 0 0 0
zio_data_buf_16777216 0x00082 0 0 16908288 16777216 0 0 0 0 0 0 0 0 0
lz4_cache 0x00080 2234752 2097152 131456 16384 17 16 16 136 128 128 0 0 0
-abd_t 0x00080 3466176 2474880 1344 120 2579 2578 2578 20632 20624 20624 0 0 0
+abd_t 0x00080 3468864 2476800 1344 120 2581 2580 2580 20648 20640 20640 0 0 0
sa_cache 0x00080 5248 2240 2624 280 2 1 1 16 8 8 0 0 0
dnode_t 0x00080 1089792 1031232 8256 984 132 131 131 1056 1048 1048 0 0 0
arc_buf_hdr_t_full 0x00080 7577152 6607232 3008 328 2519 2518 2518 20152 20144 20144 0 0 0
arc_buf_hdr_t_full_crypt 0x00080 0 0 3520 392 0 0 0 0 0 0 0 0 0
arc_buf_hdr_t_l2only 0x00080 0 0 1152 96 0 0 0 0 0 0 0 0 0
arc_buf_t 0x00080 63488 39040 1024 80 62 61 61 496 488 488 0 0 0
-dmu_buf_impl_t 0x00080 587520 519168 3456 384 170 169 169 1360 1352 1352 0 0 0
+dmu_buf_impl_t 0x00080 590976 522240 3456 384 171 170 170 1368 1360 1360 0 0 0
zil_lwb_cache 0x00080 0 0 3392 376 0 0 0 0 0 0 0 0 0
zil_zcw_cache 0x00080 0 0 1600 152 0 0 0 0 0 0 0 0 0
-sio_cache_0 0x00080 1819843904 1345100928 1472 136 1236307 1236306 1236306 9890456 9890448 9890448 0 0 0
-sio_cache_1 0x00080 3926400 2982848 1600 152 2454 2453 2453 19632 19624 19624 0 0 0
+sio_cache_0 0x00080 1903981952 1407289920 1472 136 1293466 1293465 1293465 10347728 10347720 10347720 0 0 0
+sio_cache_1 0x00080 4105600 3119040 1600 152 2566 2565 2565 20528 20520 20520 0 0 0
sio_cache_2 0x00080 307584 237888 1728 168 178 177 177 1424 1416 1416 0 0 0
zfs_znode_cache 0x00100 - 5720 - 1144 - - - - 5 - - - -
zfs_znode_hold_cache 0x00080 2176 704 1088 88 2 1 1 16 8 8 0 0 0
--- slab.last.txt 2021-05-28 18:40:32.040392052 +0000
+++ slab.new.txt 2021-05-28 18:40:33.040396570 +0000
@@ -341,8 +341,8 @@
dmu_buf_impl_t 0x00080 590976 522240 3456 384 171 170 170 1368 1360 1360 0 0 0
zil_lwb_cache 0x00080 0 0 3392 376 0 0 0 0 0 0 0 0 0
zil_zcw_cache 0x00080 0 0 1600 152 0 0 0 0 0 0 0 0 0
-sio_cache_0 0x00080 1904341120 1407555392 1472 136 1293710 1293709 1293709 10349680 10349672 10349672 0 0 0
-sio_cache_1 0x00080 4105600 3119040 1600 152 2566 2565 2565 20528 20520 20520 0 0 0
+sio_cache_0 0x00080 1988549824 1469796608 1472 136 1350917 1350916 1350916 10807336 10807328 10807328 0 0 0
+sio_cache_1 0x00080 4284800 3255232 1600 152 2678 2677 2677 21424 21416 21416 0 0 0
sio_cache_2 0x00080 307584 237888 1728 168 178 177 177 1424 1416 1416 0 0 0
zfs_znode_cache 0x00100 - 5720 - 1144 - - - - 5 - - - -
zfs_znode_hold_cache 0x00080 2176 704 1088 88 2 1 1 16 8 8 0 0 0
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
I'm running the same test on an AWS r6g.4xlarge instance with 12x500Gb gp2 EBS volumes just to make sure it's not some weird Ampere Alta thing. I'm pretty sure they're both Neoverse N1 based:
OCI A1 instance
processor : 15
BogoMIPS : 50.00
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part : 0xd0c
CPU revision : 1
AWS r6g.4xlarge
processor : 15
BogoMIPS : 243.75
Features : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x3
CPU part : 0xd0c
CPU revision : 1
Then I'll try on x86_64.
Well, that seems about right. The sio_cache*
members are the cache which is used when the scrub scanning and they're limited to 5% of system memory. So 1.9G isn't unreasonable for a 96G system. I didn't see any other very large caches, so it's a bit unclear what's using the memory exactly. Trying x86_64 would be a good sanity check, since this certainly is a pretty common case. One other thing to check would be the systems page size, there may still be some issues lurking with non-4K page systems.
Before I move to x86_64, testing on the AWS Graviton 2 shows the same issue:
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 20:25:16 2021
103G scanned at 3.82G/s, 124K issued at 4.61K/s, 129G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
nvme1n1 ONLINE 0 0 0
nvme2n1 ONLINE 0 0 0
nvme3n1 ONLINE 0 0 0
nvme4n1 ONLINE 0 0 0
nvme5n1 ONLINE 0 0 0
nvme6n1 ONLINE 0 0 0
nvme7n1 ONLINE 0 0 0
nvme8n1 ONLINE 0 0 0
nvme9n1 ONLINE 0 0 0
nvme10n1 ONLINE 0 0 0
nvme11n1 ONLINE 0 0 0
nvme12n1 ONLINE 0 0 0
errors: No known data errorsConnection to 54.226.181.X closed by remote host.
Connection to 54.226.181.X closed.
--- slab.last.txt 2021-05-28 20:25:40.748680196 +0000
+++ slab.new.txt 2021-05-28 20:25:41.758681597 +0000
@@ -182,8 +182,8 @@
dmu_buf_impl_t 0x00080 44634240 7197696 3456 384 12915 12915 67012 103320 18744 536096 0 0 0
zil_lwb_cache 0x00080 6784 3008 3392 376 2 1 1 16 8 8 0 0 0
zil_zcw_cache 0x00080 3200 1216 1600 152 2 1 1 16 8 8 0 0 0
-sio_cache_0 0x00080 2277653568 1683481984 1472 136 1547319 1547318 1547318 12378552 12378544 12378544 0 0 0
-sio_cache_1 0x00080 4859200 3691776 1600 152 3037 3036 3036 24296 24288 24288 0 0 0
+sio_cache_0 0x08080 2375485632 1755793728 1472 136 1613781 1613781 1613781 12910248 12910248 12910248 0 0 0
+sio_cache_1 0x00080 5067200 3849856 1600 152 3167 3166 3166 25336 25328 25328 0 0 0
sio_cache_2 0x00080 119232 91392 1728 168 69 68 68 552 544 544 0 0 0
zfs_znode_cache 0x00100 - 6864 - 1144 - - - - 6 - - - -
zfs_znode_hold_cache 0x00080 5440 2816 1088 88 5 4 4 40 32 32 0 0 0
Connection to 54.226.181.X closed by remote host.
Connection to 54.226.181.X closed.
This instance type has 128Gb of RAM instead of the 96 on OCI, but it runs out of memory the same way.
All I did on this new instance was boot it up, install zfs, create the pool and fs, run the fio script, then run scrub, using the official RH8.4 AMI:
RHEL-8.4.0_HVM-20210504-arm64-2-Hourly2-GP2
Happy to provide you guys with an Oracle or AWS arm64 instance to play around with if you'd like.
You can create a new ssh key pair and send me the pub key and I can set it up.
Alright, so on a r5.4xlarge instance with 16 of these:
model name : Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
And 128Gb of RAM... the scrub completes successfully and zfs never uses more than 3.5Gb of RAM. I even imported the pool from the arm64 instance just to test with the exact same on disk data.
pool: tank
state: ONLINE
scan: scrub repaired 0B in 00:43:59 with 0 errors on Fri May 28 21:09:15 2021
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol06c3bb1b1fcec5212 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0044be91588faf04d ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0c352803eca664f2d ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol04c4af5c8f2e08693 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol03fa95fc4af36924f ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol01efe0ee629742e4d ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol06c8eee0acee3193f ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0c98855ea8d5de600 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol02f4eaa6644236712 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0dfd26938bc9e4e9c ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0dcfff35a07a4735f ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0377673811f99d751 ONLINE 0 0 0
errors: No known data errors
[root@ip-172-30-0-87 ~]# while [ 1 ]; do free -m; sleep 1; done
total used free shared buff/cache available
Mem: 127189 3545 121378 16 2265 122544
Swap: 0 0 0
...
total used free shared buff/cache available
Mem: 127189 2912 122011 16 2265 123176
Swap: 0 0 0
...
total used free shared buff/cache available
Mem: 127189 1512 123411 16 2265 124576
Swap: 0 0 0
So... that's fun. :)
I'll test with rc6 on arm64 just in case of magic.
The other potential culprit is building from the ./configure script with no CFLAGS vs rebuilding the RPMs with the normal system optimisations. I'll play around with that as well.
Okay, so... this is with 2.1.0-rc6. On a new r6g.4xlarge instance with 16 Graviton 2 cores and 128Gb of RAM. It died.
I installed all the deps:
dnf install vim-enhanced tmux wget fio gcc make kernel-devel libuuid-devel libattr-devel libaio-devel openssl-devel elfutils-libelf-devel libudev-devel libblkid-devel libtirpc-devel zlib-devel pam-devel
Ran configure with no flags:
mkdir src && cd src
wget https://github.com/openzfs/zfs/releases/download/zfs-2.1.0-rc6/zfs-2.1.0-rc6.tar.gz
tar xf zfs-2.1.0-rc6.tar.gz
cd zfs-2.1.0
./configure
Ran make, and in another terminal ran ps to check which flags were getting passed to gcc:
root 165449 0.0 0.0 5824 1600 pts/0 S+ 21:32 0:00 gcc -Wp,-MD,/root/src/zfs-2.1.0/module/icp/api/.kcf_miscapi.o.d -nostdinc -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -I./arch/arm64/include -I./arch/arm64/include/generated -I./include/drm-backport -I./include -I./arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -DCC_HAVE_ASM_GOTO -mgeneral-regs-only -DCONFIG_AS_LSE=1 -fno-asynchronous-unwind-tables -mabi=lp64 -fno-dwarf2-cfi-asm -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-int-in-bool-context -O2 --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -gdwarf-4 -pg -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -fmacro-prefix-map=./= -Wno-packed-not-aligned -std=gnu99 -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -include /root/src/zfs-2.1.0/zfs_config.h -I/root/src/zfs-2.1.0/include -I/root/src/zfs-2.1.0/include/os/linux/kernel -I/root/src/zfs-2.1.0/include/os/linux/spl -I/root/src/zfs-2.1.0/include/os/linux/zfs -I/root/src/zfs-2.1.0/include -D_KERNEL -UDEBUG -DNDEBUG -I/root/src/zfs-2.1.0/module/icp/include -DMODULE -DKBUILD_BASENAME="kcf_miscapi" -DKBUILD_MODNAME="icp" -c -o /root/src/zfs-2.1.0/module/icp/api/.tmp_kcf_miscapi.o /root/src/zfs-2.1.0/module/icp/api/kcf_miscapi.c
root 165450 0.0 0.0 67136 41024 pts/0 R+ 21:32 0:00 /usr/libexec/gcc/aarch64-redhat-linux/8/cc1 -quiet -nostdinc -I ./arch/arm64/include -I ./arch/arm64/include/generated -I ./include/drm-backport -I ./include -I ./arch/arm64/include/uapi -I ./arch/arm64/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -I /root/src/zfs-2.1.0/include -I /root/src/zfs-2.1.0/include/os/linux/kernel -I /root/src/zfs-2.1.0/include/os/linux/spl -I /root/src/zfs-2.1.0/include/os/linux/zfs -I /root/src/zfs-2.1.0/include -I /root/src/zfs-2.1.0/module/icp/include -D __KERNEL__ -D KASAN_SHADOW_SCALE_SHIFT=3 -D CC_HAVE_ASM_GOTO -D CONFIG_AS_LSE=1 -D KASAN_SHADOW_SCALE_SHIFT=3 -D _KERNEL -U DEBUG -D NDEBUG -D MODULE -D KBUILD_BASENAME="kcf_miscapi" -D KBUILD_MODNAME="icp" -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -include /root/src/zfs-2.1.0/zfs_config.h -MD /root/src/zfs-2.1.0/module/icp/api/.kcf_miscapi.o.d /root/src/zfs-2.1.0/module/icp/api/kcf_miscapi.c -quiet -dumpbase kcf_miscapi.c -mlittle-endian -mgeneral-regs-only -mabi=lp64 -auxbase-strip /root/src/zfs-2.1.0/module/icp/api/.tmp_kcf_miscapi.o -g -gdwarf-4 -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security -Wno-frame-address -Wformat-truncation=0 -Wformat-overflow=0 -Wno-int-in-bool-context -Wframe-larger-than=2048 -Wno-unused-but-set-variable -Wunused-const-variable=0 -Wno-pointer-sign -Wno-stringop-truncation -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -std=gnu90 -std=gnu99 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -fno-asynchronous-unwind-tables -fno-dwarf2-cfi-asm -fno-delete-null-pointer-checks -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-inline-functions-called-once -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fmacro-prefix-map=./= --param allow-store-data-races=0 -o /tmp/ccJSdz4H.s
It's just using the same flags the kernel was built with by RH:
[root@ip-172-30-0-248 zfs-2.1.0]# cd /usr/src/kernels/4.18.0-305.el8.aarch64/
[root@ip-172-30-0-248 4.18.0-305.el8.aarch64]# fgrep -r lp64 .
./arch/arm64/Makefile:KBUILD_CFLAGS += $(call cc-option,-mabi=lp64)
./arch/arm64/Makefile:KBUILD_AFLAGS += $(call cc-option,-mabi=lp64)
./arch/riscv/Makefile: KBUILD_CFLAGS += -mabi=lp64
./arch/riscv/Makefile: KBUILD_AFLAGS += -mabi=lp64
./scripts/mod/devicetable-offsets.s:// -mabi=lp64 -auxbase-strip scripts/mod/devicetable-offsets.s -g -gdwarf-4
./scripts/mod/devicetable-offsets.s: .ascii "eneral-regs-only -mabi=lp64 -g -gdwarf-4 -O2 -std=gnu90 -p -"
This time, instead of watching the slab info, I watched /proc/meminfo.
I ran zpool scrub tank && zpool status 1:
pool: tank
state: ONLINE
scan: scrub in progress since Fri May 28 21:37:44 2021
114G scanned at 3.91G/s, 118K issued at 4.09K/s, 129G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol06c3bb1b1fcec5212 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0044be91588faf04d ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0c352803eca664f2d ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol04c4af5c8f2e08693 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol03fa95fc4af36924f ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol01efe0ee629742e4d ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol06c8eee0acee3193f ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0c98855ea8d5de600 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol02f4eaa6644236712 ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0dfd26938bc9e4e9c ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0dcfff35a07a4735f ONLINE 0 0 0
nvme-Amazon_Elastic_Block_Store_vol0377673811f99d751 ONLINE 0 0 0
errors: No known data errors
Connection to 54.91.207.X closed by remote host.
Connection to 54.91.207.X closed.
And watching meminfo:
--- meminfo.last.txt 2021-05-28 21:38:03.886107192 +0000
+++ meminfo.new.txt 2021-05-28 21:38:04.886098667 +0000
@@ -1,28 +1,28 @@
MemTotal: 132164992 kB
-MemFree: 45023424 kB
-MemAvailable: 36895104 kB
+MemFree: 41001216 kB
+MemAvailable: 32872896 kB
Buffers: 8384 kB
Cached: 4217728 kB
SwapCached: 0 kB
Active: 2140416 kB
-Inactive: 2277952 kB
+Inactive: 2278336 kB
Active(anon): 8640 kB
-Inactive(anon): 210048 kB
+Inactive(anon): 210432 kB
Active(file): 2131776 kB
Inactive(file): 2067904 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
-Dirty: 512 kB
+Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 193088 kB
+AnonPages: 193152 kB
Mapped: 88512 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1122240 kB
+Slab: 1138112 kB
SReclaimable: 116928 kB
-SUnreclaim: 1005312 kB
+SUnreclaim: 1021184 kB
KernelStack: 29248 kB
PageTables: 9408 kB
NFS_Unstable: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:04.886098667 +0000
+++ meminfo.new.txt 2021-05-28 21:38:05.906089973 +0000
@@ -1,13 +1,13 @@
MemTotal: 132164992 kB
-MemFree: 40994112 kB
-MemAvailable: 32865792 kB
+MemFree: 36908672 kB
+MemAvailable: 28780352 kB
Buffers: 8384 kB
Cached: 4217728 kB
SwapCached: 0 kB
-Active: 2140416 kB
-Inactive: 2278336 kB
-Active(anon): 8640 kB
-Inactive(anon): 210432 kB
+Active: 2140352 kB
+Inactive: 2277696 kB
+Active(anon): 8576 kB
+Inactive(anon): 209792 kB
Active(file): 2131776 kB
Inactive(file): 2067904 kB
Unevictable: 0 kB
@@ -16,15 +16,15 @@
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 193152 kB
-Mapped: 88512 kB
+AnonPages: 192576 kB
+Mapped: 88448 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1138112 kB
+Slab: 1152320 kB
SReclaimable: 116928 kB
-SUnreclaim: 1021184 kB
-KernelStack: 29248 kB
-PageTables: 9408 kB
+SUnreclaim: 1035392 kB
+KernelStack: 28992 kB
+PageTables: 9152 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:05.906089973 +0000
+++ meminfo.new.txt 2021-05-28 21:38:06.916081364 +0000
@@ -1,13 +1,13 @@
MemTotal: 132164992 kB
-MemFree: 36901568 kB
-MemAvailable: 28773248 kB
+MemFree: 32887872 kB
+MemAvailable: 24759552 kB
Buffers: 8384 kB
Cached: 4217728 kB
SwapCached: 0 kB
-Active: 2140352 kB
-Inactive: 2277696 kB
-Active(anon): 8576 kB
-Inactive(anon): 209792 kB
+Active: 2140416 kB
+Inactive: 2278080 kB
+Active(anon): 8640 kB
+Inactive(anon): 210176 kB
Active(file): 2131776 kB
Inactive(file): 2067904 kB
Unevictable: 0 kB
@@ -16,20 +16,20 @@
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 192576 kB
-Mapped: 88448 kB
+AnonPages: 193152 kB
+Mapped: 88512 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1152320 kB
+Slab: 1177152 kB
SReclaimable: 116928 kB
-SUnreclaim: 1035392 kB
-KernelStack: 28992 kB
-PageTables: 9152 kB
+SUnreclaim: 1060224 kB
+KernelStack: 29056 kB
+PageTables: 9408 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 66082496 kB
-Committed_AS: 470976 kB
+Committed_AS: 503296 kB
VmallocTotal: 133009506240 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:06.916081364 +0000
+++ meminfo.new.txt 2021-05-28 21:38:07.916072841 +0000
@@ -1,13 +1,13 @@
MemTotal: 132164992 kB
-MemFree: 32880768 kB
-MemAvailable: 24752448 kB
+MemFree: 28966848 kB
+MemAvailable: 20838528 kB
Buffers: 8384 kB
Cached: 4217728 kB
SwapCached: 0 kB
Active: 2140416 kB
-Inactive: 2278080 kB
+Inactive: 2277952 kB
Active(anon): 8640 kB
-Inactive(anon): 210176 kB
+Inactive(anon): 210048 kB
Active(file): 2131776 kB
Inactive(file): 2067904 kB
Unevictable: 0 kB
@@ -16,20 +16,20 @@
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 193152 kB
+AnonPages: 193088 kB
Mapped: 88512 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1177152 kB
+Slab: 1200192 kB
SReclaimable: 116928 kB
-SUnreclaim: 1060224 kB
-KernelStack: 29056 kB
+SUnreclaim: 1083264 kB
+KernelStack: 28992 kB
PageTables: 9408 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 66082496 kB
-Committed_AS: 503296 kB
+Committed_AS: 470656 kB
VmallocTotal: 133009506240 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:07.916072841 +0000
+++ meminfo.new.txt 2021-05-28 21:38:08.916064317 +0000
@@ -1,6 +1,6 @@
MemTotal: 132164992 kB
-MemFree: 28960704 kB
-MemAvailable: 20832384 kB
+MemFree: 25025664 kB
+MemAvailable: 16897152 kB
Buffers: 8384 kB
Cached: 4217728 kB
SwapCached: 0 kB
@@ -20,11 +20,11 @@
Mapped: 88512 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1200256 kB
+Slab: 1219392 kB
SReclaimable: 116928 kB
-SUnreclaim: 1083328 kB
+SUnreclaim: 1102464 kB
KernelStack: 28992 kB
-PageTables: 9408 kB
+PageTables: 9344 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:08.916064317 +0000
+++ meminfo.new.txt 2021-05-28 21:38:09.916055793 +0000
@@ -1,30 +1,30 @@
MemTotal: 132164992 kB
-MemFree: 25019904 kB
-MemAvailable: 16891584 kB
+MemFree: 20993280 kB
+MemAvailable: 12865024 kB
Buffers: 8384 kB
-Cached: 4217728 kB
+Cached: 4217856 kB
SwapCached: 0 kB
-Active: 2140416 kB
-Inactive: 2277952 kB
+Active: 2140480 kB
+Inactive: 2278336 kB
Active(anon): 8640 kB
-Inactive(anon): 210048 kB
-Active(file): 2131776 kB
-Inactive(file): 2067904 kB
+Inactive(anon): 210368 kB
+Active(file): 2131840 kB
+Inactive(file): 2067968 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 193088 kB
+AnonPages: 192704 kB
Mapped: 88512 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1219392 kB
+Slab: 1234688 kB
SReclaimable: 116928 kB
-SUnreclaim: 1102464 kB
-KernelStack: 28992 kB
-PageTables: 9344 kB
+SUnreclaim: 1117760 kB
+KernelStack: 28928 kB
+PageTables: 9152 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:09.926055708 +0000
+++ meminfo.new.txt 2021-05-28 21:38:10.926047184 +0000
@@ -1,13 +1,13 @@
MemTotal: 132164992 kB
-MemFree: 20985600 kB
-MemAvailable: 12857344 kB
+MemFree: 16963264 kB
+MemAvailable: 8835008 kB
Buffers: 8384 kB
Cached: 4217856 kB
SwapCached: 0 kB
-Active: 2140480 kB
-Inactive: 2278016 kB
-Active(anon): 8640 kB
-Inactive(anon): 210048 kB
+Active: 2140416 kB
+Inactive: 2274240 kB
+Active(anon): 8576 kB
+Inactive(anon): 206272 kB
Active(file): 2131840 kB
Inactive(file): 2067968 kB
Unevictable: 0 kB
@@ -16,15 +16,15 @@
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 192128 kB
-Mapped: 88512 kB
+AnonPages: 188864 kB
+Mapped: 87296 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1234688 kB
+Slab: 1259072 kB
SReclaimable: 116928 kB
-SUnreclaim: 1117760 kB
-KernelStack: 28928 kB
-PageTables: 8640 kB
+SUnreclaim: 1142144 kB
+KernelStack: 29120 kB
+PageTables: 9152 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:10.926047184 +0000
+++ meminfo.new.txt 2021-05-28 21:38:11.926038661 +0000
@@ -1,13 +1,13 @@
MemTotal: 132164992 kB
-MemFree: 16957120 kB
-MemAvailable: 8828864 kB
+MemFree: 12873280 kB
+MemAvailable: 4745024 kB
Buffers: 8384 kB
Cached: 4217856 kB
SwapCached: 0 kB
Active: 2140416 kB
-Inactive: 2274240 kB
+Inactive: 2274176 kB
Active(anon): 8576 kB
-Inactive(anon): 206272 kB
+Inactive(anon): 206208 kB
Active(file): 2131840 kB
Inactive(file): 2067968 kB
Unevictable: 0 kB
@@ -15,14 +15,14 @@
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
-Writeback: 0 kB
-AnonPages: 188864 kB
+Writeback: 128 kB
+AnonPages: 188928 kB
Mapped: 87296 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1259072 kB
+Slab: 1278912 kB
SReclaimable: 116928 kB
-SUnreclaim: 1142144 kB
+SUnreclaim: 1161984 kB
KernelStack: 29120 kB
PageTables: 9152 kB
NFS_Unstable: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:11.926038661 +0000
+++ meminfo.new.txt 2021-05-28 21:38:12.926030137 +0000
@@ -1,13 +1,13 @@
MemTotal: 132164992 kB
-MemFree: 12867136 kB
-MemAvailable: 4738880 kB
+MemFree: 8824384 kB
+MemAvailable: 696128 kB
Buffers: 8384 kB
Cached: 4217856 kB
SwapCached: 0 kB
Active: 2140416 kB
-Inactive: 2274176 kB
+Inactive: 2274304 kB
Active(anon): 8576 kB
-Inactive(anon): 206208 kB
+Inactive(anon): 206336 kB
Active(file): 2131840 kB
Inactive(file): 2067968 kB
Unevictable: 0 kB
@@ -15,16 +15,16 @@
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
-Writeback: 128 kB
-AnonPages: 188928 kB
+Writeback: 0 kB
+AnonPages: 188864 kB
Mapped: 87296 kB
Shmem: 26432 kB
KReclaimable: 116928 kB
-Slab: 1278976 kB
+Slab: 1293056 kB
SReclaimable: 116928 kB
-SUnreclaim: 1162048 kB
-KernelStack: 29120 kB
-PageTables: 9152 kB
+SUnreclaim: 1176128 kB
+KernelStack: 28928 kB
+PageTables: 9088 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
--- meminfo.last.txt 2021-05-28 21:38:12.926030137 +0000
+++ meminfo.new.txt 2021-05-28 21:38:13.946021443 +0000
@@ -1,30 +1,30 @@
MemTotal: 132164992 kB
-MemFree: 8817280 kB
-MemAvailable: 689024 kB
-Buffers: 8384 kB
-Cached: 4217856 kB
+MemFree: 9113600 kB
+MemAvailable: 0 kB
+Buffers: 128 kB
+Cached: 55488 kB
SwapCached: 0 kB
-Active: 2140416 kB
-Inactive: 2274304 kB
+Active: 35648 kB
+Inactive: 211456 kB
Active(anon): 8576 kB
-Inactive(anon): 206336 kB
-Active(file): 2131840 kB
-Inactive(file): 2067968 kB
+Inactive(anon): 210240 kB
+Active(file): 27072 kB
+Inactive(file): 1216 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 0 kB
Writeback: 0 kB
-AnonPages: 188864 kB
-Mapped: 87296 kB
+AnonPages: 193472 kB
+Mapped: 33920 kB
Shmem: 26432 kB
-KReclaimable: 116928 kB
-Slab: 1293056 kB
-SReclaimable: 116928 kB
-SUnreclaim: 1176128 kB
-KernelStack: 28928 kB
-PageTables: 9088 kB
+KReclaimable: 51840 kB
+Slab: 1178752 kB
+SReclaimable: 51840 kB
+SUnreclaim: 1126912 kB
+KernelStack: 28864 kB
+PageTables: 9920 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
Connection to 54.91.207.X closed by remote host.
Connection to 54.91.207.X closed.
Is there a "debugging ZFS on weird architectures" document somewhere? :)
I don't think AArch64 qualifies as that odd, personally.
FYI, you can use make V=1 [other args] to convince make to tell you what it's doing. (This will, necessarily, be a lot of text.)
I think OpenZFS specifies almost no per-arch flags, I believe it gets (nearly?) all of them from the compile flags in the kernel Makefile. So if you want to experiment with the flags the modules get built with, I think there's only so much you can do without nontrivial work. (If distros vary the flags used to build the kernels significantly, which I don't know, not having ever had occasion to look, you could try another distro and see if the behavior varies.)
Yeah, I mean, I just wanted to show I wasn't doing anything weird. I trust RH knows what they're doing since the rest of the system works.
I feel like most people wouldn't run anything apart from the official RH kernel on production workloads so... I'm not sure what the next step is here. I haven't done kernel development since like 2005 so I need to reactivate that part of my brain. lol.
I can try the Oracle 5.4.x kernel since that's pretty easy to test on RHEL. Might as well.
I didn't mean to suggest RH was doing something wrong, just that since I didn't see anything obviously special-casing arm64 handling, I was wondering about flag-induced behavior.
@behlendorf above wondered about the system page size - I have never had to look at this before, so I just looked something up, but it looks like getconf PAGESIZE
will answer that.
As an experiment, I'll try booting up an AArch64 VM and see if I can easily repro this...
No worries. :)
I just tested it on:
Linux instance-20210526-1929 5.4.17-2102.201.3.el8uek.aarch64 #2 SMP Fri Apr 23 09:42:46 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux
Which is the latest Oracle-for-RHEL kernel. It died the same way.
[root@instance-20210526-1929 ~]# /usr/local/sbin/zpool version
zfs-2.1.0-rc6
zfs-kmod-2.1.0-rc6
[root@instance-20210526-1929 ~]# /usr/local/sbin/zpool import tank && /usr/local/sbin/zpool status 1
pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub in progress since Fri May 28 18:24:44 2021
22.2G scanned at 3.17G/s, 84K issued at 12K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
...
pool: tank
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub in progress since Fri May 28 18:24:44 2021
84.1G scanned at 3.23G/s, 168K issued at 6.46K/s, 136G total
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
scsi-360f39ea51229408cb368509d91495fb9 ONLINE 0 0 0
scsi-3603528d43ade4b31b70186f9a041601e ONLINE 0 0 0
scsi-36007099c456f4ec780fdc03b14976f19 ONLINE 0 0 0
scsi-360d5b4cb98a44fabbcc67b1a55808124 ONLINE 0 0 0
scsi-3603ff370fa044673a5c09353568c6757 ONLINE 0 0 0
scsi-360ba05ab3eab4897bcf042fdfc3da1eb ONLINE 0 0 0
scsi-360087adf642b4f6586326dada6c8eb41 ONLINE 0 0 0
scsi-3603a47cd86dd484bba1b05bab36c1257 ONLINE 0 0 0
scsi-3600bf0330c6e4139829ad72c816b8c06 ONLINE 0 0 0
scsi-3605635d6a27b4c189c0af523ddc262de ONLINE 0 0 0
scsi-36013bd4eeaab4b4a9e88beb0474a2439 ONLINE 0 0 0
scsi-360f024a7d8b64521b7e7d671d9397ab5 ONLINE 0 0 0
errors: No known data errors
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
[root@instance-20210526-1929 ~]# while [ 1 ]; do free -m; sleep 1; done
total used free shared buff/cache available
Mem: 96706 933 93280 26 2492 86533
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 933 93280 26 2492 86533
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 954 93259 26 2492 86512
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 2754 91454 26 2496 84710
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 6223 87985 26 2496 81241
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 9670 84539 26 2496 77794
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 13096 81112 26 2496 74368
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 16512 77696 26 2496 70952
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 19913 74296 26 2496 67551
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 23329 70879 26 2496 64135
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 26735 67474 26 2497 60729
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 30121 64087 26 2497 57343
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 33473 60735 26 2497 53991
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 36830 57378 26 2497 50633
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 40174 54034 26 2497 47290
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 43448 50761 26 2497 44016
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 46780 47428 26 2497 40684
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 50080 44128 26 2497 37384
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 53382 40826 26 2497 34082
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 56681 37528 26 2497 30783
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 59925 34283 26 2497 27539
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 63229 30979 26 2497 24235
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 66530 27678 26 2497 20934
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 69834 24374 26 2497 17630
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 73119 21089 26 2497 14345
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 76424 17784 26 2497 11039
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 79786 14422 26 2497 7677
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 83129 11079 26 2497 4335
Swap: 8191 0 8191
total used free shared buff/cache available
Mem: 96706 86468 7740 26 2497 995
Swap: 8191 0 8191
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
Oracle uses slightly different CFLAGS to build their kernels, but it doesn't seem to matter:
root 113811 0.0 0.0 217984 1664 pts/0 S+ 22:19 0:00 gcc -Wp,-MD,/root/src/zfs-2.1.0/module/zcommon/.zfs_fletcher_superscalar.o.d -nostdinc -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -I./arch/arm64/include -I./arch/arm64/include/generated -I./include -I./arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 -mgeneral-regs-only -DCONFIG_AS_LSE=1 -DCONFIG_CC_HAS_K_CONSTRAINT=1 -fno-asynchronous-unwind-tables -Wno-psabi -mabi=lp64 -mindirect-branch=thunk-extern -DRETPOLINE -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -O2 -gt --param=allow-store-data-races=0 -Werror=frame-larger-than=2048 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -g -pg -fno-inline-functions-called-once -ffunction-sections -fdata-sections -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -fmacro-prefix-map=./= -fcf-protection=none -Wno-packed-not-aligned -std=gnu99 -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -include /root/src/zfs-2.1.0/zfs_config.h -I/root/src/zfs-2.1.0/include -I/root/src/zfs-2.1.0/include/os/linux/kernel -I/root/src/zfs-2.1.0/include/os/linux/spl -I/root/src/zfs-2.1.0/include/os/linux/zfs -I/root/src/zfs-2.1.0/include -D_KERNEL -UDEBUG -DNDEBUG -DMODULE -DKBUILD_BASENAME="zfs_fletcher_superscalar" -DKBUILD_MODNAME="zcommon" -c -o /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.o /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.c
root 113812 0.0 0.0 277632 32640 pts/0 R+ 22:19 0:00 /usr/libexec/gcc/aarch64-redhat-linux/8/cc1 -quiet -nostdinc -I ./arch/arm64/include -I ./arch/arm64/include/generated -I ./include -I ./arch/arm64/include/uapi -I ./arch/arm64/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -I /root/src/zfs-2.1.0/include -I /root/src/zfs-2.1.0/include/os/linux/kernel -I /root/src/zfs-2.1.0/include/os/linux/spl -I /root/src/zfs-2.1.0/include/os/linux/zfs -I /root/src/zfs-2.1.0/include -D __KERNEL__ -D KASAN_SHADOW_SCALE_SHIFT=3 -D CONFIG_AS_LSE=1 -D CONFIG_CC_HAS_K_CONSTRAINT=1 -D RETPOLINE -D KASAN_SHADOW_SCALE_SHIFT=3 -D _KERNEL -U DEBUG -D NDEBUG -D MODULE -D KBUILD_BASENAME="zfs_fletcher_superscalar" -D KBUILD_MODNAME="zcommon" -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -include /root/src/zfs-2.1.0/zfs_config.h -MD /root/src/zfs-2.1.0/module/zcommon/.zfs_fletcher_superscalar.o.d /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.c -quiet -dumpbase zfs_fletcher_superscalar.c -mlittle-endian -mgeneral-regs-only -mabi=lp64 -mindirect-branch=thunk-extern -auxbase-strip /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.o -gt -g -O2 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -Wno-psabi -Wno-frame-address -Wformat-truncation=0 -Wformat-overflow=0 -Werror=frame-larger-than=2048 -Wframe-larger-than=2048 -Wno-unused-but-set-variable -Wimplicit-fallthrough=3 -Wunused-const-variable=0 -Wvla -Wno-pointer-sign -Wno-stringop-truncation -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -std=gnu90 -std=gnu99 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -fno-inline-functions-called-once -ffunction-sections -fdata-sections -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fmacro-prefix-map=./= -fcf-protection=none --param allow-store-data-races=0 -o /tmp/ccirKvo4.s
The pagesize on OCI A1:
[root@instance-20210526-1929 ~]# getconf PAGESIZE
65536
The pagesize on the AWS Graviton 2:
[root@ip-172-30-0-199 ec2-user]# getconf PAGESIZE
65536
The Neoverse N1 tech manual (https://documentation-service.arm.com/static/5f561d50235b3560a01e03b5?token=) says:
The instruction fetch unit includes:
• A 64KB, 4-way, set associative L1 instruction cache with 64-byte cache lines and parity protection.
• A fully associative L1 instruction TLB with native support for 4KB, 16KB, 64KB, 2MB, and 32MB page sizes.
• A dynamic branch predictor.
• Configurable support for instruction cache hardware coherency
https://www.kernel.org/doc/html/latest/arm64/memory.html
Has a good rundown of the page sizes on AArch64/Linux.
It seems like maybe Debian uses 4K page size on AArch64 and RHEL uses 64K page size.
Found it. Issue #11574 describes this same issue with zpool scrub
except on ppc64le. The workaround there was to set spl_kmem_cache_slab_limit=16384
, and the issue appears to be related to the page size (also 64k) which is what prompted me to ask. I haven't looked, but it sounds like an issue with our slab implementation are larger page sizes.
Thanks @behlendorf.
Now I know what to focus on I can take a look at the code.
For whatever reason when it locks up on AWS the instance becomes completely unresponsive and is unsalvageable. The only option is to terminate the entire instance.
On OCI it's at least rebootable. And with the RH 4.18.0-305 kernel it even reboots itself, which is nice.
We may want to find tune this a bit, but here's what I'm currently thinking would be a reasonable fix. Basically, if the page size was anything other than 4k we'd always fallback to using the SPLs kmem implementation which requires page alignment and was causing the memory inflation. We were effectively, wasting the majority of every page we allocated. If you can verify this resolves the issue I'll open a PR and we can go from there.
diff --git a/module/os/linux/spl/spl-kmem-cache.c b/module/os/linux/spl/spl-kme>
index 6b3d559ff..4b7867b7e 100644
--- a/module/os/linux/spl/spl-kmem-cache.c
+++ b/module/os/linux/spl/spl-kmem-cache.c
@@ -100,12 +100,13 @@ MODULE_PARM_DESC(spl_kmem_cache_max_size, "Maximum size o>
* For small objects the Linux slab allocator should be used to make the most
* efficient use of the memory. However, large objects are not supported by
* the Linux slab and therefore the SPL implementation is preferred. A cutoff
- * of 16K was determined to be optimal for architectures using 4K pages.
+ * of 16K was determined to be optimal for architectures using 4K pages. For
+ * larger page sizes set the cutoff at a single page.
*/
-#if PAGE_SIZE == 4096
+#if PAGE_SIZE <= 16384
unsigned int spl_kmem_cache_slab_limit = 16384;
#else
-unsigned int spl_kmem_cache_slab_limit = 0;
+unsigned int spl_kmem_cache_slab_limit = PAGE_SIZE;
#endif
module_param(spl_kmem_cache_slab_limit, uint, 0644);
MODULE_PARM_DESC(spl_kmem_cache_slab_limit,
I've opened PR #12152 with the patch above and an explanation of the issue. I haven't actually tested it however, so it'd be great if you could confirm it really does resolve the problem.
@behlendorf I'm testing it now.
Okay, I did this by modifying the modprobe.d file for spl:
[root@instance-20210526-1929 ~]# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=25769803776
options spl spl_kmem_cache_slab_limit=65536
[root@instance-20210526-1929 ~]# cat /sys/module/spl/parameters/spl_kmem_cache_slab_limit
65536
Just echo'ing into that sysfs file didn't work at first. I had to rmmod all the zfs modules and modprobe again.
With that change it doesn't run out of RAM. I'll try a couple of other values just to make sure that's the best one. But at the very least... it no longer crashes. Awesome. :)
That's right. You just need to make sure it's set before importing the pool.
@behlendorf Cool.
So, I did some testing of various values of spl_kmem_cache_slab_limit.
16k
24k
32k
40k
48k
56k
64k
128k
256k
Every value finished in the same time / at the same speed:
pool: tank
state: ONLINE
scan: scrub in progress since Sat May 29 01:50:35 2021
136G scanned at 1.20G/s, 136G issued at 1.20G/s, 136G total
0B repaired, 99.80% done, 00:00:00 to go
...
pool: tank
state: ONLINE
scan: scrub repaired 0B in 00:01:54 with 0 errors on Sat May 29 01:52:29 2021
I ran 'vmstat 1' alongside each scrub and stopped it as soon as the scrub was complete. I wrote a little thing to aggregate the values across the run time for each limit I tested. I've put the output here:
https://gist.github.com/omarkilani/346fb6ac8406fc0a51d0c267c3a31fa3
On the whole I don't think it makes any difference which value is chosen. 16k seems to have a lower system time but it's within a margin of error so I wouldn't put any stock in it.
I think the PR is good to go.
I ran some Postgres benchmarks at the various limit levels, with 64k on the 64k page size kernel providing the best performance:
16k: avg latency = 2.404 ms, avg tps = 13315.121336
latency average = 2.352 ms
tps = 13603.681323 (including connections establishing)
tps = 13604.826099 (excluding connections establishing)
latency average = 2.389 ms
tps = 13394.444262 (including connections establishing)
tps = 13395.613079 (excluding connections establishing)
latency average = 2.472 ms
tps = 12943.765913 (including connections establishing)
tps = 12944.924831 (excluding connections establishing)
---
64k: avg latency = 2.313 ms, avg tps = 13838.339199
latency average = 2.233 ms
tps = 14329.728653 (including connections establishing)
tps = 14332.726826 (excluding connections establishing)
latency average = 2.271 ms
tps = 14090.842201 (including connections establishing)
tps = 14092.230062 (excluding connections establishing)
latency average = 2.445 ms
tps = 13088.930065 (including connections establishing)
tps = 13090.060708 (excluding connections establishing)
---
128k: avg latency = 2.366 ms, avg tps = 13527.519451
latency average = 2.370 ms
tps = 13504.669294 (including connections establishing)
tps = 13505.974290 (excluding connections establishing)
latency average = 2.347 ms
tps = 13634.011648 (including connections establishing)
tps = 13635.310885 (excluding connections establishing)
latency average = 2.381 ms
tps = 13440.105691 (including connections establishing)
tps = 13441.273178 (excluding connections establishing)
---
256k: avg latency = 2.423 ms, avg tps = 13218.833702
latency average = 2.513 ms
tps = 12732.348960 (including connections establishing)
tps = 12733.493379 (excluding connections establishing)
latency average = 2.392 ms
tps = 13379.154862 (including connections establishing)
tps = 13380.268778 (excluding connections establishing)
latency average = 2.363 ms
tps = 13541.525764 (including connections establishing)
tps = 13542.738950 (excluding connections establishing)
One final test, fio run with the following config:
[global]
bs=64K
iodepth=64
direct=1
ioengine=libaio
group_reporting
time_based
runtime=60
numjobs=4
name=raw-read
rw=read
[job1]
filename=/tank/db/f.fio
size=128G
At 16k/64k/128k/256k.
Outputs here:
https://gist.github.com/omarkilani/dc8f6d167493e9b94fae7402de841ec4
64k and 16k look alright on the 64k page size kernel.
Thanks for all your help @rincebrain and @behlendorf . Glad there was a solution in the end. :)
Ran a pgbench stress test on zfs with spl_kmem_cache_slab_limit=65536
for 12 hours, and the machine survived. It also survived a scrub
of the resulting on disk data. 👍
starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 43200 s
number of transactions actually processed: 612557228
latency average = 2.257 ms
tps = 14179.551402 (including connections establishing)
tps = 14179.553286 (excluding connections establishing)
System information
Describe the problem you're observing
I was doing a stress test on a zfs pool running Postgres. I left it running overnight and came back to a locked up VM. Nothing on the console from the lock up that I could see, but I suspect zfs was behind the lock up.
When I rebooted the VM, I ran scrub on the pool. The machine ran out of memory in about 5 seconds and the OOM kicked in, and eventually the machine rebooted.
If I import the pool again the scrub kicks off again automatically and the machine runs out of memory again.
Will try 2.0.4 soon.
Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs