Closed glztmf closed 3 years ago
On December 7, 2019 12:57 pm, glztmf wrote:
System information
Type Version/Name Distribution Name Debian Distribution Version 10 (buster) Linux Kernel 5.0.15-1-pve Architecture x86_64 ZFS Version 0.8.1-pve1 SPL Version 0.8.1-pve1
please upgrade to the current version of PVE and report back if you still experience the issue - there have been several important fixes in ZoL 0.8.2!
@Fabian-Gruenbichler , thanks, i will try ZoL 0.8.2 a little later. now i'm trying to find the exact way to reproduce the problem in order to test same way on other versions
Same hangs with pve 6.1-3 and zfs 0.8.2.
Created 2 VMs and ran following fio tests simultaneously:
fio --loops=100 --size=500m --filename=/mnt/fiotest.tmp --stonewall --ioengine=libaio \
--name=1 --bs=8k --rw=randrw \
--name=2 --bs=512k --rw=randrw \
--name=3 --bs=512k --rw=randrw \
--name=4 --bs=512k --rw=randrw \
--name=5 --bs=512k --iodepth=32 --rw=randrw \
--name=6 --bs=512k --iodepth=32 --rw=randrw
After about an hour VMs and zfs hang with similar symptoms.
Processes with D state:
# ps -e -o pid,state,start,command | egrep ' D | D< | Dl |STAT'
506 D 15:13:13 [zvol]
7014 D 15:19:04 [txg_sync]
19278 D 16:26:38 [zvol]
19279 D 16:26:38 [zvol]
19280 D 16:26:38 [zvol]
19281 D 16:26:38 [zvol]
19282 D 16:26:38 [zvol]
19283 D 16:26:38 [zvol]
19284 D 16:26:38 [zvol]
19285 D 16:26:38 [zvol]
19286 D 16:26:38 [zvol]
19287 D 16:26:38 [zvol]
19288 D 16:26:38 [zvol]
19289 D 16:26:38 [zvol]
19291 D 16:26:38 [zvol]
19293 D 16:26:38 [zvol]
19294 D 16:26:38 [zvol]
19295 D 16:26:38 [zvol]
19296 D 16:26:38 [zvol]
19297 D 16:26:38 [zvol]
19298 D 16:26:38 [zvol]
19300 D 16:26:38 [zvol]
19301 D 16:26:38 [zvol]
19303 D 16:26:38 [zvol]
19304 D 16:26:38 [zvol]
19305 D 16:26:38 [zvol]
19322 D 16:26:38 [zvol]
19348 D 16:26:38 [zvol]
19349 D 16:26:38 [zvol]
19350 D 16:26:38 [zvol]
19351 D 16:26:38 [zvol]
19352 D 16:26:38 [zvol]
19353 D 16:26:38 [zvol]
Inside VMs terminals similar messages:
task sometask:xxx blocked for more than 120 seconds.
And fio IO drops to zero, but doesn't hang:
fio-3.7
Starting 6 processes
Jobs: 1 (f=1): [m(1),P(5)][36.6%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 24m:15s]
Also found zfs module option doesn't work at startup:
# cat /etc/modprobe.d/zfs.conf
options zfs zfs_arc_max=10737418240
... because after startup this parameter with default value:
# cat /sys/module/zfs/parameters/zfs_arc_max
0
Am I missing something? Or should I create one more issue...
Same thing with 4k volblocksize-d virtual disks.
PID STAT STARTED COMMAND
510 D< 20:52:21 [zvol]
3298 S+ 12:57:01 grep -E D | D< | Dl |STAT
19814 D 20:52:42 [txg_sync]
25706 D< 12:50:13 [zvol]
25707 D< 12:50:13 [zvol]
25709 D< 12:50:13 [zvol]
25710 D< 12:50:13 [zvol]
25711 D< 12:50:13 [zvol]
25712 D< 12:50:13 [zvol]
25713 D< 12:50:13 [zvol]
25714 D< 12:50:13 [zvol]
25715 D< 12:50:13 [zvol]
25716 D< 12:50:13 [zvol]
25717 D< 12:50:13 [zvol]
25718 D< 12:50:13 [zvol]
25719 D< 12:50:13 [zvol]
25720 D< 12:50:13 [zvol]
25721 D< 12:50:13 [zvol]
25722 D< 12:50:13 [zvol]
25723 D< 12:50:13 [zvol]
25724 D< 12:50:13 [zvol]
25725 D< 12:50:13 [zvol]
25726 D< 12:50:13 [zvol]
25727 D< 12:50:13 [zvol]
25728 D< 12:50:13 [zvol]
25729 D< 12:50:13 [zvol]
25730 D< 12:50:13 [zvol]
25731 D< 12:50:13 [zvol]
25732 D< 12:50:13 [zvol]
25733 D< 12:50:13 [zvol]
25734 D< 12:50:13 [zvol]
25735 D< 12:50:13 [zvol]
25736 D< 12:50:13 [zvol]
25737 D< 12:50:13 [zvol]
Interesting what means that there are always 31 new [zvol] freezing processes + 1 old [zvol] process.
i think this one is related: https://github.com/openzfs/zfs/issues/10095
just for my interest - why has this been closed without further comment/notice?
probably related: https://github.com/openzfs/zfs/issues/9172
probably a duplicate of #9172
similar errors, but zfs versions are too much different and in #9172 hangs only zvol with compression on, whereas I have no compression and my hangs affect management commands of any zfs volume/dataset also we moved to pve5.4-3 with zfs version 0.7.13 and such errors have never occurred on it
just for my interest - why has this been closed without further comment/notice?
it was an accident...from my smartphone :)
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.
i still think it sucks that tickets getting closed automatically because of inactivity.
it's like sweeping dirt under the carpet
System information
Describe the problem you're observing
Some operations with specific zpool/zvols hang. Virtual machines freeze.
Describe how to reproduce the problem
Install PVE 6.0-4. Create additional mirror zpool. Create several VMs with vdisks on created zpool. Add load on VMs disks and wait for a few days.
Include any warning/errors/backtraces from the system logs
Inside VMs terminals:
Processes:
from dmesg:
zfs create command hangs:
creating snapshot:
zpool history command hangs:
zpool status
works fine:zdb commands work fine after
zpool set cachefile=/etc/zfs/zpool.cache zpl2