openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.62k stars 1.75k forks source link

zfs 0.6.4-139_g1cd7773: freezes under heavy load #3599

Closed vozhyk- closed 9 years ago

vozhyk- commented 9 years ago

The system can freeze when there is heavy disk IO, high memory usage, and high load-average (over 12 on a 2-core CPU) (HDD and pendrive LEDs don't blink, mouse pointer doesn't move, clock on i3bar stops, can't switch VTs, but NumLock and SysRq combinations work).

The last 2 times one what was running was git (emerge --sync), firefox, thunderbird, skype (the first time had mocp, and the second had dropbox). There is 3GB of RAM, 6GB of swap on a separate partition, zswap with "zswap.compressor=lz4 zswap.max_pool_percent=25". Similar freezes also happened with zram with similar settings, and also on versions from git before 0.6.4 (not sure if they were the same as these ones, but the symptoms were the same). The load-average has probably been caused by zswap (load-average rises dramatically when the system swaps). zpool is made of an 80GB internal HDD (~40M/s sequential read) and a 16GB flash drive (don't have numbers, but very slow). The kernel is 3.16.7-aufs with BFS and TuxOnIce patches. The distribution is Funtoo current x86_64, updated less than a week ago.

I tried to use SysRq combinations to get some info, but I couldn't see any output (and couldn't switch to the framebuffer console both with control-alt-f* and sysrq combinations (might be because I use the proprietary NVidia driver; using nouveau is possible with this card, but there were freezes when I checked it last (at least on Ubuntu 14.04 when it was released))). Alt-SysRq-c made a kernel panic (CapsLock LED was blinking), but didn't switch to the framebuffer console either.

Are there other ways to get information about the freeze besides switching to nouveau?

DorianGray commented 9 years ago

I am seeing the same thing on ubuntu 15.04, kernel 4.1.2, zfs 0.6.4.

Also, using docker with anything but vfs as the storage driver triggers a hard freeze.

kernelOfTruth commented 9 years ago

@vozhyk- try using a kernel without BFS, if that's not an option compile NUMA support into your kernel

I've hardlocks with BFS and heavy disk i/o (e.g. running Btrfs scrub)

when running a kernel with NUMA-support it's perfectly stable.

Not sure if it also applies to that kernel version (3.16.7 and BFS) - but that lockup under heavy load and BFS happened under 3.18, 3.19, 4.0 and 4.1 (comments on Con's website and the arch linux forum)

vozhyk- commented 9 years ago

I'll build with these options and see if it gets fixed.

/usr/src/linux # grep -i numa .config
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_ARCH_WANTS_PROT_NUMA_PROT_NONE=y
CONFIG_NUMA=y
# CONFIG_AMD_NUMA is not set
# CONFIG_X86_64_ACPI_NUMA is not set
# CONFIG_NUMA_EMU is not set
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
# CONFIG_ACPI_NUMA is not set
<...>
CONFIG_NODES_SHIFT=1
vozhyk- commented 9 years ago

Also, after I posted the bugreport, there were similar hangs (not sure if they are the same): First disk IO stopped working (pendrive LED not blinking, HDD LED blinking periodically a couple of times; switching desktops and VTs worked, but logging in in framebuffer console hung (after entering username)), then after some time the whole system hung. These times the load wasn't anything big.

vozhyk- commented 9 years ago

Overnight I left emerge, firefox, dropbox, du with sort, thunderbird and pidgin running. It was slow (only 2 packages merged in 6 hours - kpathsea took 4.5 hours instead of 2 minutes) but didn't lock up. So NUMA support should have fixed this.

vozhyk- commented 9 years ago

@kernelOfTruth Could you send me a link to those comments, as I can't find them?

kernelOfTruth commented 9 years ago

@vozhyk- http://ck-hack.blogspot.co.at/2014/12/bfs-460-linux-318-ck1.html#comment-form http://ck-hack.blogspot.co.at/2015/04/bfs-462-linux-40-ck1.html#comment-form

https://bbs.archlinux.org/viewtopic.php?id=111715&p=97 + following

search for NUMA and you'll get some matches

vozhyk- commented 9 years ago

@kernelOfTruth Thanks. I see, they don't know exactly why this happens. [EDIT: everything after rebuilding the kernel with NUMA support happened with zfs 0.6.4-170_g76a98bb] So now I guess only the performance problem remains (dropbox (or dropbox&emerge, but at least once it was dropbox without emerge) sync causes everything to be slow and unresponsive), so the ticket can be closed. Do you think using a non-BFS kernel will increase performance in my case? (In any case, I need to at least recreate the pool without the slow pendrive - ZFS performance hasn't been this low when the pool had only the HDD.) And thanks for your help.

kernelOfTruth commented 9 years ago

@vozhyk- BFS is more tailored toward low latency and multimedia usage, CFS more towards throughput - so yes, a newer non-BFS kernel with autogroup enabled *could* perhaps help in your case

Most of the times BFS however was working better for me but when heavily tweaking CFS it is also able to achieve very low latency (perhaps even lower than BFS, with probably lower throughput compared to BFS - haven't tested this with benchmarks)

(appending threadirqs to the kernel line, raising priority e.g. of nvidia and other IRQs)

ZFS appears to cause very high latency (well, at least as a non-native implementation and default settings when comparing it to illumos or FreeBSD, from what I read),

If low latency is the focus you can try out the following settings which I also posted on f.g.o. (https://forums.gentoo.org/viewtopic-p-7766056.html#7766056)

Like written on the thread:

With the default settings I'm getting latencies up to 3000 ms+ when running rsync or copying big stuff (measured with latencytop and the CFS cpu scheduler)

With the following settings the latencies are down to 200-300 ms (so far)

echo 128 > /sys/module/zfs/parameters/zfs_vdev_max_active 
# default zfs_vdev_max_active = 1000 

echo 3 > /sys/module/zfs/parameters/zfs_vdev_async_write_max_active 
# default zfs_vdev_async_write_max_active = 10 
echo 5 > /sys/module/zfs/parameters/zfs_vdev_sync_write_max_active 
# default zfs_vdev_sync_write_max_active = 10 
echo 3 > /sys/module/zfs/parameters/zfs_vdev_async_read_max_active 
# default zfs_vdev_async_read_max_active = 3 
echo 10 > /sys/module/zfs/parameters/zfs_vdev_sync_read_max_active 
# default zfs_vdev_sync_read_max_active = 10 

echo 5 > /sys/module/zfs/parameters/zfs_vdev_sync_write_min_active 
# default zfs_vdev_sync_write_min_active = 10 
echo 1 > /sys/module/zfs/parameters/zfs_vdev_async_read_min_active 
# default zfs_vdev_async_read_min_active = 1 
echo 5 > /sys/module/zfs/parameters/zfs_vdev_sync_read_min_active 
# default zfs_vdev_sync_read_min_active = 10 
echo 1 > /sys/module/zfs/parameters/zfs_vdev_async_write_min_active 
# default zfs_vdev_async_write_min_active = 1 

echo 2 > /sys/module/zfs/parameters/zfs_txg_timeout 
# default zfs_txg_timeout = 5
vozhyk- commented 9 years ago

@kernelOfTruth Thanks a lot, I'll try them. By the way, is it a bad thing to continue to talk about performance in a closed bug about a lockup?

kernelOfTruth commented 9 years ago

As a reference - it could be helpful for others that have a similar "exotic" configuration & kernel (BFS)

also: you wrote about high load, freezes, lockups - so it's somewhat related :smirk:

behlendorf commented 9 years ago

To continue the performance discussion a little more I'd be curious to know what the proposed tuning do to throughput on the system. @kernelOfTruth have you noticed any significant impact?

kernelOfTruth commented 9 years ago

@behlendorf I'm not really in the position to talk about performance (at least on a bigger scale):

my usual "workload" consists of backing up around 2 TB incrementally, whereof the most changes are minor (small files of around 2-10 MB, occasionally 600-900 MB (recorded movies) and the biggest chunks would be 20-40 GB VMs that did change every few days)

Having those in mind I didn't notice a noticeable increase in time to back that data up but I'll take a closer look during the next syncs and will post if I see it taking unusually long