Closed mgiammarco closed 13 years ago
It looks like both processes are waiting on an open the open transaction, but it's not clear why. With what version are you able to reproduce this? Fixes for several deadlocks have been commited post -rc4. You can give then a try if you pull the latest master.
Version is on first line of log, it is the latest in ppa.
Il giorno 02/giu/2011 18:13, "behlendorf" < reply@reply.github.com> ha scritto:
It looks like both processes are waiting on an open the open transaction, but it's not clear why. With what version are you able to reproduce this? Fixes for several deadlocks have been commited post -rc4. You can give then a try if you pull the latest master.
Reply to this email directly or view it on GitHub: https://github.com/behlendorf/zfs/issues/261#comment_1281571
I have disabled l2arc and zil log. I have redone the rsync test. Unfortunately it fails again. But this time it fails due to outofmemory killer.
How much memory is in your system? At the moment things get a little dicey with ZFS if you have less than about 2GB.
The server was on esxi so I gave it 5gb this time. I repeat the settings:
Result: zfs eats all memory again this time after more time. It is strange that for first hours memory consumption seems normal, then it goes crazy and eats all memory.
I use latest version from ppa.
I get a similar issue on a testing backup server when using rsync. Interestingly, the problem does not occur when the heavy I/O is generated by a non-rsync process (specifically, the OpenAFS backup server). The server in question has 7GB of RAM for a RAIDz pool of 4x 750GB drives. Compression is enabled on the pool but deduplication is disabled. Kernel is v2.6.36 checked out from kernel.org.
Please let me know if you need any more information.
Several deadlocks which would result in the symptoms you describe were recently fixed in the master branch. If your able to consistently reproduce this issue I'd appreciate it if you could try the latest source from github and let me know if the issue remains. Thanks!
a140dc5469399eaf2347ee7db503275f38f5dc12 - Fix mmap(2)/write(2)/read(2) deadlock abd39a8289c3bc2766f281933689d5140445aa85 - Fix zio_execute() deadlock cfc9a5c88f91f7b4d606fce89505e1f404691ea5 - Fix zpl_writepage() deadlock
Thanks I will try when these patches will go in the ppa repository. I write now to say that I have found (by accident) a workaround: in the virtual machine that I us for tests I have found that I have put two virtual cpus. Now I have put only one cpu and it seems it works (when memory is full now memory is reclaimed and no task killing occours).
I suppose the deadlocks you have found will help with 2 cpus.
2011/7/20 behlendorf < reply@reply.github.com>
Several deadlocks which would result in the symptoms you describe where recently fixed in the master branch. If your able to consistently reproduce this issue I'd appreciate it if you could try the latest source from github and let me know if the issue remains. Thanks!
a140dc5469399eaf2347ee7db503275f38f5dc12 - Fix mmap(2)/write(2)/read(2) deadlock abd39a8289c3bc2766f281933689d5140445aa85 - Fix zio_execute() deadlock cfc9a5c88f91f7b4d606fce89505e1f404691ea5 - Fix zpl_writepage() deadlock
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1617181
4 rsync streams between this linux 233TB pool and one solaris 233TB zfs pool grinds system to a halt. Work around seems to be a taskset command that assigns rsync to 2 physical processors of a 4 physical processor system. After that, open ssh user interfaces will randomly hang for short periods; then recover. Obviously user interactive is peocess scheduled randomly. AMD Opteron dual core, 4 physical cpus, 8gb memory; CentOS 6 kernel update 2.6.32-71.29.1.el6.x86_64, spl and zfs 0.6.0-rc5. I realize none of this is technical but rsyncs are in progress, I'm home on a VPN. Rsync hang is an open bug for the slightly past due rc6 and/or 0.6.0 stable release. FYI - We do have reasonf for doing zfs send/receive duplications. The CentOS system can dual boot Solaris, i.e. its original OS.
Typos in last comment. Should read "We do have reasons for not doing zfs send/receive duplications." Rsyncs are running. Currently set to run on processor #3 via taskset. Interactive processes do recover after hangs. Did have open 10 ssh sessions today. A date, sleep 1 loop was running in all shells. Generally, only one shell would hang at a time... perhaps between 2 and 15 seconds.
I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.
I agree. Putting a very small zfs_arc_max slow down the hanging of the system. It seems that something during rsync wake up a memory leak.
2011/8/2 Alphalead < reply@reply.github.com>
I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1709328
Not in a position to unload ZFS between rsyncs. Other user processes need/use the pool. Just reporting what I, as an "ordinary" interactive user would see. Re-enforcing that the bug does exist and is noted for a fix in rc6. When fixed, it seems that rc6 and 0.6.0 will be stable enough for us to replace the Solaris OS. Looking to get ZFS and FUSE/LTFS (Linear Tape File System) on the same OS platform. We are looking to off load "deep/old" archived projects onto LTFS. Again we have reasons not to do zfs send/receive to/from tape as we have reasons not to do zfs send/receive between stand alone systems/pools.
Anyway I have done a tar reading files and writing to /dev/null and the system hangs, so problem is not due to disk writes.
2011/8/2 Mario Giammarco mgiammarco@gmail.com
I agree. Putting a very small zfs_arc_max slow down the hanging of the system. It seems that something during rsync wake up a memory leak.
2011/8/2 Alphalead < reply@reply.github.com>
I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1709328
Can you please post a simple test case which can recreate this issue. While this is tagged to be fixed for 0.6.0-rc6 I haven't seriously investigated it yet. A test case would help a lot.
Happy to try and help. I'm a sysadmin and not a programmer in the technical sense. Do you need my grouping of rsync commands, my multi open shell interactive test loop, or a more technical detail? For any debug and/or trace output, a cdl command string would be helpful.
tstudios
Sam Allgood Sr. Production Engineer Turner Studios Turner Broadcasting System Atlanta, GA 30318
-----Original Message----- From: behlendorf [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 7:04 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)
Can you please post a simple test case which can recreate this issue. While this is tagged to be fixed for 0.6.0-rc6 I haven't seriously investigated it yet. A test case would help a lot.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1714048
If you can provide me enough information to recreate the issue locally that would be ideal.
While not on this system, now dual booting Solaris and Linux, another identical system (still on Solaris) will be sharing zfs file systems within the pool via Samba. It would be interesting to know if a large single file read via Samba will also hang ZFS. Don't want to clutter the Bug Comments too much here. Seems there are two cases that can hang.
tstudios Sam Allgood Turner Studios
-----Original Message----- From: mgiammarco [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 6:11 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)
Anyway I have done a tar reading files and writing to /dev/null and the system hangs, so problem is not due to disk writes.
2011/8/2 Mario Giammarco mgiammarco@gmail.com
I agree. Putting a very small zfs_arc_max slow down the hanging of the system. It seems that something during rsync wake up a memory leak.
2011/8/2 Alphalead < reply@reply.github.com>
I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1709328
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1713675
It's late here EDT. Tomorrow, I'll give you my exact commands from a daily running script (real values and not $variables) and the rsync reports of how much data has moved, bytes sent/received, etc. That plus dmesg, kernel version.
I'm running taskset -c rsync
I do see recent comments about tar reads from zfs pools to /dev/null also hanging.
tstudios
-----Original Message----- From: behlendorf [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 7:28 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)
If you can provide me enough information to recreate the issue locally that would be ideal.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1714207
Brian,
Below is a synopsis of yesterdays rsyncs. Hopefully, enough information to Roughly duplicate my user setup. Hopefully, bad line wraps are not too annoying. This was pasted in from a simple text editor
tstudios - Sam Allgood Turner Studios
Kernel is stock from the latest yum update: [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat /etc/redhat-release CentOS Linux release 6.0 (Final [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# uname -rv 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011
Rsync version: [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# rsync --version rsync version 3.0.6 protocol version 30 Copyright (C) 1996-2009 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes
Running the following in 10 or so ssh login shells. I can see one or more of these hang randomly for random lengths: while true; do date; sleep 1; done
The task is to sync a Solaris zfs pool with 4 file systems (besides the root pool) to this Linux zfs pool with 4 file systems (besides the root pool). This process has been running daily for a couple of years now. The Linux box now dual boots Linux/Solaris. The sync commands originate on the Linux system over a 10G link with a 9000 mtu. There is about a minute offset between each command. These did execute With a taskset -c 3. If I let the system schedule itself, I soon loose it.
The zfs list below comes from the Linux system and is indicative of the pool and file systems on both Linux and Solaris. Just so happens yesterday that my largest file system had the least data churn. Spent 4 hours parsing that file system. All file systems are a mixture of files from small text files to nearly 1TB archives from a compositing application Autocad Discreet Flame. The parse times from my log for that least churn/ largest filesystem: [root@tsdpl-bu tsdpl-A-10G-fx_archive]# head -2 log.0 SYSTEM:tsdpl-A-10G-fx_archive START:2011-08-02-12-02-02 [root@tsdpl-bu tsdpl-A-10G-fx_archive]# tail -1 log.0 END:2011-08-02-15-59-19
Hmmnnn.."...first error in dmesg? Presuming dmesg, it looks like a fairly standard boot. Managing pool import, legacy mount, ifs and samba from a custom script. Only import and legacy mownt for now
Okay, now the ifs list, sync arguments, sync summary, and an indication of files added/deleted. Note one file system is a legacy mount. Note the pool "root" is not rsynced. For the --excludes, SNAPSHOT is a Symbolic link to .zfs, and I don't/can't pull those across.
NAME USED AVAIL REFER MOUNTPOINT zdev001 206T 26.3T 22K /tsdpl zdev001/backups 52.6T 26.3T 52.6T /tsdpl/backups zdev001/fx_archive 142T 26.3T 142T /tsdpl/fx_archive zdev001/gfx01 10.1T 26.3T 10.1T legacy zdev001/gfx_archive 1.72T 26.3T 1.72T /tsdpl/gfx_archive
taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/tsdpl/backups/ /tsdpl/backups
sent 13798995 bytes received 190547787390 bytes 9751136.57 bytes/sec total size is 57435529244802 speedup is 301.40
[root@tsdpl-bu tsdpl-A-10G-backups]# cat log.0 | grep deleting | wc -l 991 [root@tsdpl-bu tsdpl-A-10G-backups]# cat log.0 | grep -v deleting | wc -l 66126
taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude .zfs tsdpl::root/tsdpl/fx_archive/ /tsdpl/fx_archive
receiving incremental file list system/cron_archive_log.txt system/cron_catalog_log.txt
sent 276728 bytes received 268401036 bytes 18871.13 bytes/sec total size is 155228895891957 speedup is 577751.18
taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/gfx01/ /gfx01
sent 1662192 bytes received 83044609836 bytes 10524842.79 bytes/sec total size is 11025515851641 speedup is 132.76
[root@tsdpl-bu tsdpl-A-10G-gfx01]# cat log.0 | grep deleting | wc -l 289446 [root@tsdpl-bu tsdpl-A-10G-gfx01]# cat log.0 | grep -v deleting | wc -l 85258
taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/tsdpl/gfx_archive/ /tsdpl/gfx_archive
sent 84485 bytes received 1411890863 bytes 1125528.38 bytes/sec total size is 1875081239999 speedup is 1327.98
[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat log.0 | grep deleting | wc -l 0 [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat log.0 | grep -v deleting | wc -l 372 -----Original Message----- From: behlendorf [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 7:28 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)
If you can provide me enough information to recreate the issue locally that would be ideal.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1714207
rsyncing 1.5TB of data in a few large files (20 files total, ranging in size from 17GB to 140GB compressed; 18GB to 232GB uncompressed) is an example of an rsync that runs fine with no memory issues.
rsyncing 60GB of data in ~2 million files (mail storage backups) will result in the arc cache ballooning beyond the defined maximum. The call stack when the system starts swapping is:
[134432.390032] Pid: 25143, comm: kworker/u:1 Tainted: P 2.6.36 #7
[134432.390036] Call Trace:
[134432.390050] [
@Alphalead: It certainly sounds like the meta data limits aren't being enforced properly for the arc, have you checked the /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab when it's in this state. I'd be interested to see what they report, and where the memory is being used.
I seem to be hitting this problem too. Just default values on everything (I'll implement the suggestions above) except for zpool being created with ashift=12.
Here are /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab with the machine effectively useless. rsync was copying from local ext4 to local zfs. Load average was about 18.5 but there was no I/O to speak of. There was 6900M or so of RAM allocated to this VM (a Xen dom0) with htop reporting ~5900 in use (but not by the applications, they can barely use 2GB). Killing rsync didn't help, and when I forced the machine to reboot I saw an rsync process segfault right before the actual system reboot (perhaps the open files got pulled out from under it?)
3 1 0x01 61 2928 146583166654 245155398080153
name type data
hits 4 38630741
misses 4 11918394
demand_data_hits 4 1051214
demand_data_misses 4 210641
demand_metadata_hits 4 32812696
demand_metadata_misses 4 10711823
prefetch_data_hits 4 44925
prefetch_data_misses 4 112506
prefetch_metadata_hits 4 4721906
prefetch_metadata_misses 4 883424
mru_hits 4 2313878
mru_ghost_hits 4 1762080
mfu_hits 4 31550033
mfu_ghost_hits 4 9489768
deleted 4 12505789
recycle_miss 4 47548198
mutex_miss 4 77617
evict_skip 4 5325335645
evict_l2_cached 4 0
evict_l2_eligible 4 894196381184
evict_l2_ineligible 4 9847328256
hash_elements 4 576798
hash_elements_max 4 576799
hash_collisions 4 7084752
hash_chains 4 157562
hash_chain_max 4 8
p 4 360042944
c 4 2180250624
c_min 4 2180250624
c_max 4 17442004992
size 4 2819764592
hdr_size 4 184016480
data_size 4 662532096
other_size 4 1973216016
l2_hits 4 0
l2_misses 4 0
l2_feeds 4 0
l2_rw_clash 4 0
l2_read_bytes 4 0
l2_write_bytes 4 0
l2_writes_sent 4 0
l2_writes_done 4 0
l2_writes_error 4 0
l2_writes_hdr_miss 4 0
l2_evict_lock_retry 4 0
l2_evict_reading 4 0
l2_free_on_write 4 0
l2_abort_lowmem 4 0
l2_cksum_bad 4 0
l2_io_error 4 0
l2_size 4 0
l2_hdr_size 4 0
memory_throttle_count 4 9491
memory_direct_count 4 59
memory_indirect_count 4 321540
arc_no_grow 4 1
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_meta_used 4 2819764592
arc_meta_limit 4 4360501248
arc_meta_max 4 3733333912
--------------------- cache ------------------------------------------------------- ----- slab ------ ---- object -----
name flags size alloc slabsize objsize total alloc max total alloc max
spl_vn_cache 0x00020 0 0 4096 88 0 0 7 0 0 128
spl_vn_file_cache 0x00020 0 0 4096 88 0 0 0 0 0 0
spl_zlib_workspace_cache 0x00040 0 0 8388608 268104 0 0 0 0 0 0
zio_cache 0x00040 360448 333792 16384 912 22 22 1956 374 366 33252
zio_link_cache 0x00020 36864 14304 4096 48 9 9 749 405 298 33664
zio_buf_512 0x00042 1372160000 643706368 32768 512 41875 41875 50625 1298125 1257239 1569352
zio_data_buf_512 0x00042 0 0 32768 512 0 0 843 0 0 26110
zio_buf_1024 0x00042 131072 65536 32768 1024 4 4 448 84 64 9407
zio_data_buf_1024 0x00042 0 0 32768 1024 0 0 965 0 0 20263
zio_buf_1536 0x00042 196608 98304 65536 1536 3 3 164 93 64 5084
zio_data_buf_1536 0x00042 0 0 65536 1536 0 0 1386 0 0 42947
zio_buf_2048 0x00042 196608 149504 65536 2048 3 3 175 75 73 4359
zio_data_buf_2048 0x00042 0 0 65536 2048 0 0 3525 0 0 88125
zio_buf_2560 0x00042 262144 163840 65536 2560 4 4 254 84 64 5324
zio_data_buf_2560 0x00042 0 0 65536 2560 0 0 6054 0 0 127120
zio_buf_3072 0x00042 0 0 65536 3072 0 0 356 0 0 6400
zio_data_buf_3072 0x00042 0 0 65536 3072 0 0 3637 0 0 65464
zio_buf_3584 0x00042 0 0 131072 3584 0 0 46 0 0 1408
zio_data_buf_3584 0x00042 0 0 131072 3584 0 0 466 0 0 14435
zio_buf_4096 0x00042 1048576 262144 262144 4096 4 4 355 124 64 11005
zio_data_buf_4096 0x00042 0 0 262144 4096 0 0 551 0 0 17073
zio_buf_5120 0x00042 262144 163840 131072 5120 2 2 80 42 32 1661
zio_data_buf_5120 0x00042 0 0 131072 5120 0 0 1458 0 0 30598
zio_buf_6144 0x00042 262144 190464 131072 6144 2 2 57 36 31 1023
zio_data_buf_6144 0x00042 0 0 131072 6144 0 0 819 0 0 14738
zio_buf_7168 0x00042 786432 458752 262144 7168 3 3 25 93 64 767
zio_data_buf_7168 0x00042 0 0 262144 7168 0 0 225 0 0 6955
zio_buf_8192 0x00042 786432 425984 262144 8192 3 3 154 63 52 3218
zio_data_buf_8192 0x00042 1048576 524288 262144 8192 4 4 447 84 64 9378
zio_buf_10240 0x00042 0 0 262144 10240 0 0 39 0 0 804
zio_data_buf_10240 0x00042 0 0 262144 10240 0 0 181 0 0 3797
zio_buf_12288 0x00042 1048576 454656 524288 12288 2 2 24 62 37 722
zio_data_buf_12288 0x00042 0 0 524288 12288 0 0 79 0 0 2424
zio_buf_14336 0x00042 0 0 524288 14336 0 0 10 0 0 305
zio_data_buf_14336 0x00042 0 0 524288 14336 0 0 51 0 0 1555
zio_buf_16384 0x00042 1329070080 663404544 524288 16384 2535 2535 6746 63375 40491 168650
zio_data_buf_16384 0x00042 0 0 524288 16384 0 0 83 0 0 2071
zio_buf_20480 0x00042 2097152 1310720 524288 20480 4 4 40 84 64 829
zio_data_buf_20480 0x00042 0 0 524288 20480 0 0 190 0 0 3981
zio_buf_24576 0x00042 1572864 1327104 524288 24576 3 3 31 54 54 548
zio_data_buf_24576 0x00042 0 0 524288 24576 0 0 104 0 0 1864
zio_buf_28672 0x00042 0 0 1048576 28672 0 0 12 0 0 366
zio_data_buf_28672 0x00042 0 0 1048576 28672 0 0 66 0 0 2033
zio_buf_32768 0x00042 3145728 2097152 1048576 32768 3 3 12 84 64 320
zio_data_buf_32768 0x00042 0 0 1048576 32768 0 0 81 0 0 2244
zio_buf_36864 0x00042 2097152 1179648 1048576 36864 2 2 13 50 32 323
zio_data_buf_36864 0x00042 0 0 1048576 36864 0 0 104 0 0 2599
zio_buf_40960 0x00042 1048576 0 1048576 40960 1 0 15 23 0 338
zio_data_buf_40960 0x00042 0 0 1048576 40960 0 0 30 0 0 689
zio_buf_45056 0x00042 0 0 1048576 45056 0 0 11 0 0 231
zio_data_buf_45056 0x00042 0 0 1048576 45056 0 0 32 0 0 652
zio_buf_49152 0x00042 4194304 3145728 1048576 49152 4 4 10 76 64 173
zio_data_buf_49152 0x00042 0 0 1048576 49152 0 0 38 0 0 705
zio_buf_53248 0x00042 0 0 1048576 53248 0 0 23 0 0 403
zio_data_buf_53248 0x00042 0 0 1048576 53248 0 0 37 0 0 664
zio_buf_57344 0x00042 0 0 1048576 57344 0 0 19 0 0 308
zio_data_buf_57344 0x00042 0 0 1048576 57344 0 0 38 0 0 642
zio_buf_61440 0x00042 0 0 2097152 61440 0 0 7 0 0 187
zio_data_buf_61440 0x00042 0 0 2097152 61440 0 0 18 0 0 532
zio_buf_65536 0x00042 39845888 31653888 2097152 65536 19 19 22 570 483 641
zio_data_buf_65536 0x00042 0 0 2097152 65536 0 0 14 0 0 420
zio_buf_69632 0x00042 0 0 2097152 69632 0 0 12 0 0 320
zio_data_buf_69632 0x00042 0 0 2097152 69632 0 0 14 0 0 384
zio_buf_73728 0x00042 0 0 2097152 73728 0 0 11 0 0 286
zio_data_buf_73728 0x00042 0 0 2097152 73728 0 0 9 0 0 232
zio_buf_77824 0x00042 0 0 2097152 77824 0 0 12 0 0 288
zio_data_buf_77824 0x00042 0 0 2097152 77824 0 0 8 0 0 200
zio_buf_81920 0x00042 0 0 2097152 81920 0 0 7 0 0 161
zio_data_buf_81920 0x00042 0 0 2097152 81920 0 0 11 0 0 245
zio_buf_86016 0x00042 0 0 2097152 86016 0 0 6 0 0 133
zio_data_buf_86016 0x00042 0 0 2097152 86016 0 0 9 0 0 206
zio_buf_90112 0x00042 0 0 2097152 90112 0 0 6 0 0 128
zio_data_buf_90112 0x00042 0 0 2097152 90112 0 0 9 0 0 188
zio_buf_94208 0x00042 0 0 2097152 94208 0 0 8 0 0 155
zio_data_buf_94208 0x00042 0 0 2097152 94208 0 0 13 0 0 263
zio_buf_98304 0x00042 0 0 2097152 98304 0 0 17 0 0 324
zio_data_buf_98304 0x00042 0 0 2097152 98304 0 0 13 0 0 244
zio_buf_102400 0x00042 0 0 2097152 102400 0 0 12 0 0 223
zio_data_buf_102400 0x00042 0 0 2097152 102400 0 0 10 0 0 186
zio_buf_106496 0x00042 0 0 2097152 106496 0 0 16 0 0 288
zio_data_buf_106496 0x00042 0 0 2097152 106496 0 0 12 0 0 212
zio_buf_110592 0x00042 0 0 2097152 110592 0 0 9 0 0 161
zio_data_buf_110592 0x00042 0 0 2097152 110592 0 0 12 0 0 211
zio_buf_114688 0x00042 0 0 2097152 114688 0 0 12 0 0 192
zio_data_buf_114688 0x00042 0 0 2097152 114688 0 0 10 0 0 156
zio_buf_118784 0x00042 0 0 2097152 118784 0 0 6 0 0 97
zio_data_buf_118784 0x00042 0 0 2097152 118784 0 0 12 0 0 191
zio_buf_122880 0x00042 0 0 2097152 122880 0 0 7 0 0 97
zio_data_buf_122880 0x00042 0 0 2097152 122880 0 0 16 0 0 256
zio_buf_126976 0x00042 0 0 4194304 126976 0 0 4 0 0 98
zio_data_buf_126976 0x00042 0 0 4194304 126976 0 0 6 0 0 162
zio_buf_131072 0x00042 0 0 4194304 131072 0 0 20 0 0 593
zio_data_buf_131072 0x00042 0 0 4194304 131072 0 0 816 0 0 25268
sa_cache 0x00020 50860032 28278800 4096 80 12417 12417 19334 409761 353485 637997
dnode_t 0x00040 1327349760 1186980880 16384 944 81015 81015 96886 1296240 1257395 1550176
dmu_buf_impl_t 0x00020 485531648 384151168 8192 296 59269 59269 68442 1422456 1297808 1642587
arc_buf_hdr_t 0x00020 196968448 166191840 8192 288 24044 24044 24044 577056 577055 577055
arc_buf_t 0x00020 25071616 4558288 4096 112 6121 6121 14313 159146 40699 372126
zil_lwb_cache 0x00020 65536 49536 4096 192 16 16 192 272 258 3253
zfs_znode_cache 0x00020 411664384 342173480 16384 968 25126 25126 39873 402016 353485 637968
Interesting. So it looks like the bulk of the memory is being consumed by meta data in the ARC. It also looks like memory is getting fragmented, but it's not horrible yet. What's unclear is why all of those dnodes, inodes, and meta-data buffers aren't being released to the system. They should be unless they are still considered to be in use. Have you tried forcing the caches to be dropped with drop_caches? This will cause everything which is unused to be reclaimed, which should be happening anyway. Also, have you seen this issue with an other apps besides rsync? Say a find in the directory or something else which accesses all of these files.
echo 1 >/proc/sys/vm/drop_caches
I had the same problem with a tar that reads from zfs and writes to /dev/null Il giorno 19/ago/2011 20:04, "behlendorf" < reply@reply.github.com> ha scritto:
Interesting. So it looks like the bulk of the memory is being consumed by meta data in the ARC. It also looks like memory is getting fragmented, but it's not horrible yet. What's unclear is why all of those dnodes, inodes, and meta-data buffers aren't being released to the system. They should be unless they are still considered to be in use. Have you tried forcing the caches to be dropped with drop_caches? This will cause everything which is unused to be reclaimed, which should be happening anyway. Also, have you seen this issue with an other apps besides rsync? Say a find in the directory or something else which accesses all of these files.
echo 1 >/proc/sys/vm/drop_cachesReply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1854419
OK, so I did:
find /backup -ls > /dev/null
It ate about 4GB of RAM, I killed it when there was about a gig left.
I have, in /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_max=107374182 zfs_arc_meta_limit=107374182 zfs_arc_max=107374182
The filesystem options are:
# zfs get all backup/snapshots| grep -v default
NAME PROPERTY VALUE SOURCE
backup/snapshots type filesystem -
backup/snapshots creation Sun Aug 14 17:51 2011 -
backup/snapshots used 597G -
backup/snapshots available 1.20T -
backup/snapshots referenced 597G -
backup/snapshots compressratio 1.13x -
backup/snapshots mounted yes -
backup/snapshots compression on inherited from backup
backup/snapshots version 5 -
backup/snapshots utf8only off -
backup/snapshots normalization none -
backup/snapshots casesensitivity sensitive -
backup/snapshots primarycache metadata local
backup/snapshots usedbysnapshots 0 -
backup/snapshots usedbydataset 597G -
backup/snapshots usedbychildren 0 -
backup/snapshots usedbyrefreservation 0 -
echo 1 >/proc/sys/vm/drop_caches
didn't seem to do anything.
lsof | grep find
returns nothing
zfs export backup
lets me watch memory usage fall by about 100MB/s until it's back down to where it started and the zpool unmounts.
I grabbed the stats as above again, I'll post them if they'd have value (don't want to overly clutter).
ls -R
will even do it, though to a lesser extent.
i'm not sure if this is the same thing but here is what i get when i rsync many files to zfs:
Aug 21 20:49:00 anterovipunen kernel: [ 1320.307559] kswapd0 D ffff88007788c2c0 0 21 2 0x00000000
Aug 21 20:49:00 anterovipunen kernel: [ 1320.307567] ffff88007aa49a40 0000000000000046 ffff88007aa49970 ffffffff81049bcd
Aug 21 20:49:00 anterovipunen kernel: [ 1320.309635] ffff88007b4314d0 ffff88007aa49fd8 ffff88007aa49fd8 ffff88007aa49fd8
Aug 21 20:49:00 anterovipunen kernel: [ 1320.311666] ffff88007395b090 ffff88007b4314d0 ffff88007f492540 0000000000000001
Aug 21 20:49:00 anterovipunen kernel: [ 1320.313632] Call Trace:
Aug 21 20:49:00 anterovipunen kernel: [ 1320.315639] [
it works fine for a while and then just dies. i was using the latest repo code on arch linux, kernel 3.0.3 with the recommended options from the FAQ.
Hi - Not sure what's going on here as I don't know much regarding the programming/debugging of zfs but I seems to experience that issue, that is with or without rsync. I never had to read much files on my system as I use zfs for backups storage, however I just had to restore things from the zfs pool and it crashed after a while.. rebooted.. crashed after a while..
Here is the error I found in dmesg/syslog : http://pastebin.com/jMTCNEFy
If there is anything I can do to help, as far as testings, don't hesitate to let me know.
Thanks,
edit: using git from 2011-08-22 on debian squeeze
I think I started seeing this when I updated to rc5. I was running rc4 with the ashift patch previously, but I also updated from 2.6.32 to 2.6.40 kernel, so both things changed. I also changed the disks from 1.5TB to 2TB.
The reporter's messages show v0.6.0.16, but mine shows only v0.6.0.
As a workaround, I'm making a script that does each step in the backup and then doing a zpool export/import between each step.
I found another workaround to this issue. Running the following on the system while running long rsyncs keeps the memory usage down to the specified arc sizes: while true; do echo 1 >/proc/sys/vm/drop_caches; sleep 60; done;
I'm not sure why this works while just running a single echo 1 >/proc/sys/vm/drop_caches doesn't free larger amounts of RAM when they are in use.
Alphalead : I think your trick allowed my rsync session to last longer but after a while it crashed again unfortunately
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981661] Oops: 0002 [#1] SMP
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981673] last sysfs file: /sys/module/mbcache/initstate
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982056] Stack:
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982133] Call Trace:
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982206] Code: 49 8b 14 04 48 c1 e2 03 e8 83 88 ff ff 85 c0 75 10 48 8d 54 24 70 48 89 de 44 89 ef e8 5b f3 ff ff 48 8b 54 24 50 be d0 00 00 00 <48> c7 02 00 00 00 00 48 8b 54 24 48 48 8b 7c 24 70 e8 7d f6 ff
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982346] CR2: 0000000000000000
edit: version used is latest commit (2708f716c0)
I am not disappearead. I try all version from ppa repository.
With the last version there are several improvements.
Unfortunately I get this:
[ 2125.140009] kworker/0:1: page allocation failure. order:0, mode:0x20
[ 2125.140023] Pid: 22, comm: kworker/0:1 Tainted: P
2.6.38-11-server #50-Ubuntu
[ 2125.140025] Call Trace:
[ 2125.140149] [
Platform
[ 2132.945544] RIP: 0010:[
---------- Forwarded message ---------- From: briceb83 < reply@reply.github.com> Date: 2011/9/11 Subject: Re: [zfs] Hang during rsync (#261) To: mgiammarco mgiammarco@gmail.com
Alphalead : I think your trick allowed my rsync session to last longer but after a while it crashed again unfortunately
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981661] Oops: 0002 [#1] SMP
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981673] last sysfs file: /sys/module/mbcache/initstate
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982056] Stack:
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982133] Call Trace:
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982206] Code: 49 8b 14 04 48 c1 e2 03 e8 83 88 ff ff 85 c0 75 10 48 8d 54 24 70 48 89 de 44 89 ef e8 5b f3 ff ff 48 8b 54 24 50 be d0 00 00 00 <48> c7 02 00 00 00 00 48 8b 54 24 48 48 8b 7c 24 70 e8 7d f6 ff
Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982346] CR2: 0000000000000000
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-2064709
I add that after several hours the system definitively hangs with this (partial) message:
SLUB: Unable to allocate memory on node -1 (gfp=0x20)
2011/10/1 Mario Giammarco mgiammarco@gmail.com
I am not disappearead. I try all version from ppa repository. With the last version there are several improvements. Unfortunately I get this: [ 2125.140009] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.140023] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.140025] Call Trace: [ 2125.140149] [
<48> c7 02 00 00 00 00 48 8b 95 40 ff ff ff 48 8b 7d 80 e8 a3 ce [ 2132.950800] RIP [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.140256] [ ] ? default_spin_lock_flags+0x9/0x10 [ 2125.140261] [ slab_alloc+0x262/0x390 [ 2125.140431] [] ? alloc_pages_current+0xa5/0x110 [ 2125.140314] [ ] ? new_slab+0x282/0x290 [ 2125.140317] [ ] ? ] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.140435] [ ] ? kmem_cache_alloc+0xff/0x110 [ 2125.140437] [ ] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.140439] [ ] ? scsi_host_alloc_command.clone.7+0x33/0xa0 [ 2125.140442] [ ] ? scsi_get_command+0x2e/0xc0 [ 2125.140444] [ ] ? scsi_get_command+0x48/0xc0 [ 2125.140449] [ generic_unplug_device+0x37/0x40 [ 2125.140546] [] ? scsi_setup_fs_cmnd+0x8d/0xe0 [ 2125.140457] [ ] ? sd_prep_fn+0xaf/0xba0 [ 2125.140465] [ ] ? __wake_up+0x53/0x70 [ 2125.140534] [ ] ? blk_peek_request+0xce/0x220 [ 2125.140537] [ ] ? scsi_request_fn+0x66/0x4d0 [ 2125.140540] [ ] ? blk_unplug_work+0x0/0x70 [ 2125.140543] [ ] ? ] ? generic_unplug_device+0x30/0x50 [ 2125.140554] [ ] ? blk_unplug_work+0x32/0x70 [ 2125.140572] [ ] ? process_one_work+0x11d/0x420 [ 2125.140576] [ ] ? worker_thread+0x169/0x360 [ 2125.140578] [ ] ? worker_thread+0x0/0x360 [ 2125.140581] [ ] ? kthread+0x96/0xa0 [ 2125.140587] [ ] ? kernel_thread_helper+0x4/0x10 [ 2125.140589] [ ] ? kthread+0x0/0xa0 [ 2125.140591] [ ] ? kernel_thread_helper+0x0/0x10 [ 2125.140593] Mem-Info: [ 2125.140595] Node 0 DMA per-cpu: [ 2125.140658] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.140660] Node 0 DMA32 per-cpu: [ 2125.140662] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.140663] Node 0 Normal per-cpu: [ 2125.140665] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.140668] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.140669] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.140670] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.140671] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.140671] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.140673] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140685] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.140688] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140696] lowmem_reserve[]: 0 0 2020 2020 [ 2125.140698] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140706] lowmem_reserve[]: 0 0 0 0 [ 2125.140708] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.140716] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.140723] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.140733] 5578 total pagecache pages [ 2125.140734] 0 pages in swap cache [ 2125.140736] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.140737] Free swap = 0kB [ 2125.140738] Total swap = 0kB [ 2125.191753] 1310704 pages RAM [ 2125.191756] 37445 pages reserved [ 2125.191757] 17402 pages shared [ 2125.191758] 1181257 pages non-shared [ 2125.191762] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.191764] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.191767] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.191885] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.191888] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.191890] Call Trace: [ 2125.191892] [ ] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.192098] [ ] ? spin_unlock+0xe/0x20 [ 2125.192148] [ alloc_skb+0x4f/0x170 [ 2125.192371] [] ? tcp_v4_rcv+0x659/0x900 [ 2125.192158] [ ] ? alloc_pages_current+0xa5/0x110 [ 2125.192161] [ ] ? new_slab+0x282/0x290 [ 2125.192219] [ ] ? unmap_single+0x34/0x70 [ 2125.192222] [ ] ? __slab_alloc+0x262/0x390 [ 2125.192368] [ ] ? ] ? kmem_cache_alloc_node+0x69/0x120 [ 2125.192376] [ ] ? enqueue_task_fair+0x61/0xa0 [ 2125.192378] [ ] ? alloc_skb+0x4f/0x170 [ 2125.192380] [ ] ? netdev_alloc_skb+0x24/0x50 [ 2125.192524] [ generic_unplug_device+0x37/0x40 [ 2125.192585] [] ? e1000_clean_rx_irq+0x2ad/0x4b0 [e1000] [ 2125.192529] [ rcu_process_callbacks+0x99/0x180 [ 2125.192543] [] ? e1000_clean+0x51/0xc0 [e1000] [ 2125.192532] [ ] ? net_rx_action+0x128/0x270 [ 2125.192540] [ ] ? ] ? __do_softirq+0xa8/0x1c0 [ 2125.192551] [ ] ? tick_program_event+0x1f/0x30 [ 2125.192555] [ ] ? call_softirq+0x1c/0x30 [ 2125.192557] [ ] ? do_softirq+0x65/0xa0 [ 2125.192559] [ ] ? irq_exit+0x85/0x90 [ 2125.192567] [ ] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.192570] [ ] ? apic_timer_interrupt+0x13/0x20 [ 2125.192572] [ ] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.192579] [ ] ? blk_unplug_work+0x0/0x70 [ 2125.192582] [ ] ? ] ? generic_unplug_device+0x30/0x50 [ 2125.192587] [ ] ? blk_unplug_work+0x32/0x70 [ 2125.192591] [ ] ? process_one_work+0x11d/0x420 [ 2125.192594] [ ] ? worker_thread+0x169/0x360 [ 2125.192597] [ ] ? worker_thread+0x0/0x360 [ 2125.192599] [ ] ? kthread+0x96/0xa0 [ 2125.192602] [ ] ? kernel_thread_helper+0x4/0x10 [ 2125.192604] [ ] ? kthread+0x0/0xa0 [ 2125.192607] [ ] ? kernel_thread_helper+0x0/0x10 [ 2125.192608] Mem-Info: [ 2125.192610] Node 0 DMA per-cpu: [ 2125.192612] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.192613] Node 0 DMA32 per-cpu: [ 2125.192615] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.192616] Node 0 Normal per-cpu: [ 2125.192618] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.192622] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.192622] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.192623] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.192624] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.192625] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.192626] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192634] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.192637] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192644] lowmem_reserve[]: 0 0 2020 2020 [ 2125.192647] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192659] lowmem_reserve[]: 0 0 0 0 [ 2125.192661] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.192669] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.192675] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.192682] 5578 total pagecache pages [ 2125.192683] 0 pages in swap cache [ 2125.192685] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.192686] Free swap = 0kB [ 2125.192687] Total swap = 0kB [ 2125.212609] 1310704 pages RAM [ 2125.212612] 37445 pages reserved [ 2125.212613] 17402 pages shared [ 2125.212614] 1181257 pages non-shared [ 2125.212618] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.212620] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.212623] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.212684] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.212688] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.212690] Call Trace: [ 2125.212692] [ ] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.212705] [ ] ? try_to_wake_up+0x244/0x3e0 [ 2125.212709] [ slab_alloc+0x262/0x390 [ 2125.212754] [] ? alloc_pages_current+0xa5/0x110 [ 2125.212712] [ ] ? new_slab+0x282/0x290 [ 2125.212743] [ ] ? lapic_next_event+0x1d/0x30 [ 2125.212748] [ ] ? clockevents_program_event+0x67/0xa0 [ 2125.212750] [ ] ? ] ? skb_clone+0x154/0x170 [ 2125.212757] [ ] ? tick_program_event+0x1f/0x30 [ 2125.212759] [ ] ? kmem_cache_alloc+0xff/0x110 [ 2125.212762] [ ] ? skb_clone+0x154/0x170 [ 2125.212936] [ ] ? packet_rcv_spkt+0x7d/0x1a0 [ 2125.212941] [ ] ? netif_receive_skb+0x2e4/0x580 [ 2125.212944] [ ] ? netif_receive_skb+0x58/0x80 [ 2125.212947] [ rcu_process_callbacks+0x99/0x180 [ 2125.212980] [] ? napi_skb_finish+0x50/0x70 [ 2125.212949] [ ] ? napi_gro_receive+0xb5/0xc0 [ 2125.212963] [ ] ? e1000_receive_skb+0x5b/0x90 [e1000] [ 2125.212967] [ ] ? e1000_clean_rx_irq+0x25e/0x4b0 [e1000] [ 2125.212971] [ ] ? e1000_clean+0x51/0xc0 [e1000] [ 2125.212974] [ ] ? net_rx_action+0x128/0x270 [ 2125.212977] [ ] ? ] ? do_softirq+0xa8/0x1c0 [ 2125.212984] [ ] ? tick_program_event+0x1f/0x30 [ 2125.212987] [ slab_alloc+0x262/0x390 [ 2125.234608] [] ? call_softirq+0x1c/0x30 [ 2125.212989] [ ] ? do_softirq+0x65/0xa0 [ 2125.212991] [ ] ? irq_exit+0x85/0x90 [ 2125.212994] [ ] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.212997] [ ] ? apic_timer_interrupt+0x13/0x20 [ 2125.212998] [ ] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.213007] [ ] ? blk_unplug_work+0x0/0x70 [ 2125.213010] [ ] ? generic_unplug_device+0x37/0x40 [ 2125.213012] [ ] ? generic_unplug_device+0x30/0x50 [ 2125.213015] [ alloc_pages_nodemask+0x5f4/0x830 [ 2125.234596] [] ? blk_unplug_work+0x32/0x70 [ 2125.213018] [ ] ? process_one_work+0x11d/0x420 [ 2125.213021] [ ] ? worker_thread+0x169/0x360 [ 2125.213024] [ ] ? worker_thread+0x0/0x360 [ 2125.213026] [ ] ? kthread+0x96/0xa0 [ 2125.213028] [ ] ? kernel_thread_helper+0x4/0x10 [ 2125.213040] [ ] ? kthread+0x0/0xa0 [ 2125.213042] [ ] ? kernel_thread_helper+0x0/0x10 [ 2125.213044] Mem-Info: [ 2125.213045] Node 0 DMA per-cpu: [ 2125.213048] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.213049] Node 0 DMA32 per-cpu: [ 2125.213051] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.213052] Node 0 Normal per-cpu: [ 2125.213054] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.213057] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.213058] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.213059] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.213059] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.213060] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.213062] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213070] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.213073] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213080] lowmem_reserve[]: 0 0 2020 2020 [ 2125.213082] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213090] lowmem_reserve[]: 0 0 0 0 [ 2125.213093] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.213100] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.213107] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.213113] 5578 total pagecache pages [ 2125.213116] 0 pages in swap cache [ 2125.213117] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.213118] Free swap = 0kB [ 2125.213119] Total swap = 0kB [ 2125.234376] 1310704 pages RAM [ 2125.234379] 37445 pages reserved [ 2125.234380] 17402 pages shared [ 2125.234381] 1181257 pages non-shared [ 2125.234385] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.234388] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.234390] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.234579] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.234582] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.234584] Call Trace: [ 2125.234586] [ ] ? ] ? alloc_pages_current+0xa5/0x110 [ 2125.234599] [ ] ? alloc_pages_current+0xa5/0x110 [ 2125.234602] [ ] ? new_slab+0x282/0x290 [ 2125.234605] [ ] ? ] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.234665] [ ] ? kmem_cache_alloc+0xff/0x110 [ 2125.234669] [ ] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.234671] [ ] ? scsi_host_alloc_command.clone.7+0x33/0xa0 [ 2125.234692] [ ] ? scsi_get_command+0x2e/0xc0 [ 2125.234695] [ ] ? scsi_get_command+0x48/0xc0 [ 2125.234698] [ blk_run_queue+0x8e/0x170 [ 2125.234768] [] ? scsi_setup_fs_cmnd+0x8d/0xe0 [ 2125.234701] [ ] ? sd_prep_fn+0xaf/0xba0 [ 2125.234709] [ ] ? mptspi_qcmd+0x97/0x100 [mptspi] [ 2125.234714] [ ] ? blk_peek_request+0xce/0x220 [ 2125.234763] [ ] ? scsi_request_fn+0x66/0x4d0 [ 2125.234766] [ ] ? ] ? blk_run_queue+0x32/0x50 [ 2125.234770] [ ] ? scsi_run_queue+0xc8/0x2a0 [ 2125.234772] [ ] ? scsi_put_command+0x67/0xa0 [ 2125.234775] [ ] ? scsi_next_command+0x42/0x60 [ 2125.234777] [ do_softirq+0xa8/0x1c0 [ 2125.234792] [] ? scsi_io_completion+0x252/0x590 [ 2125.234780] [ ] ? scsi_finish_command+0xc5/0x130 [ 2125.234782] [ ] ? scsi_softirq_done+0x147/0x170 [ 2125.234785] [ ] ? blk_done_softirq+0x85/0xa0 [ 2125.234788] [ ] ? ] ? tick_program_event+0x1f/0x30 [ 2125.234795] [ ] ? call_softirq+0x1c/0x30 [ 2125.234797] [ ] ? do_softirq+0x65/0xa0 [ 2125.234799] [ ] ? irq_exit+0x85/0x90 [ 2125.234803] [ ] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.234805] [ ] ? apic_timer_interrupt+0x13/0x20 [ 2125.234807] [ ] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.234811] [ ] ? blk_unplug_work+0x0/0x70 [ 2125.234814] [ ] ? __generic_unplug_device+0x37/0x40 [ 2125.234817] [ ] ? generic_unplug_device+0x30/0x50 [ 2125.234819] [ ] ? blk_unplug_work+0x32/0x70 [ 2125.234822] [ ] ? process_one_work+0x11d/0x420 [ 2125.234825] [ ] ? worker_thread+0x169/0x360 [ 2125.234828] [ ] ? worker_thread+0x0/0x360 [ 2125.234830] [ ] ? kthread+0x96/0xa0 [ 2125.234833] [ ] ? kernel_thread_helper+0x4/0x10 [ 2125.234835] [ ] ? kthread+0x0/0xa0 [ 2125.234837] [ ] ? kernel_thread_helper+0x0/0x10 [ 2125.234838] Mem-Info: [ 2125.234840] Node 0 DMA per-cpu: [ 2125.234842] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.234843] Node 0 DMA32 per-cpu: [ 2125.234845] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.234847] Node 0 Normal per-cpu: [ 2125.234848] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.234852] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.234853] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.234853] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.234854] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.234855] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.234856] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234864] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.234867] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234874] lowmem_reserve[]: 0 0 2020 2020 [ 2125.234884] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234891] lowmem_reserve[]: 0 0 0 0 [ 2125.234894] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.234901] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.234908] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.234914] 5578 total pagecache pages [ 2125.234916] 0 pages in swap cache [ 2125.234917] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.234918] Free swap = 0kB [ 2125.234919] Total swap = 0kB [ 2125.255600] 1310704 pages RAM [ 2125.255603] 37445 pages reserved [ 2125.255604] 17402 pages shared [ 2125.255605] 1181257 pages non-shared [ 2125.255609] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.255612] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.255614] node 0: slabs: 124, objs: 1984, free: 0 [ 2132.944060] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2132.944284] IP: [ ] balance_pgdat+0x516/0x6e0 [ 2132.944513] PGD 13307d067 PUD 155da4067 PMD 0 [ 2132.944584] Oops: 0002 [#1] SMP [ 2132.944633] last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/host2/target2:0:2/2:0:2:0/block/sdc/queue/scheduler [ 2132.944781] CPU 0 [ 2132.944809] Modules linked in: zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate ppdev nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc psmouse vmw_balloon serio_raw parport_pc i2c_piix4 shpchp lp parport mptspi floppy e1000 mptscsih mptbase [ 2132.945198] [ 2132.945220] Pid: 23, comm: kswapd0 Tainted: P 2.6.38-11-server #50-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [ 2132.945544] RIP: 0010:[ ] [ ] balance_pgdat+0x516/0x6e0 [ 2132.945664] RSP: 0018:ffff880174707d90 EFLAGS: 00010282 [ 2132.945737] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000bf6 [ 2132.945837] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000ae0 [ 2132.945935] RBP: ffff880174707e80 R08: 0000000000000000 R09: 0000000000000100 [ 2132.946190] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 [ 2132.946383] R13: ffff88017fffb000 R14: ffff88017fffb000 R15: 0000000000000002 [ 2132.946606] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 [ 2132.946894] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2132.947061] CR2: 0000000000000000 CR3: 0000000155da5000 CR4: 00000000000006f0 [ 2132.947284] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2132.947490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2132.947677] Process kswapd0 (pid: 23, threadinfo ffff880174706000, task ffff88017470adc0) [ 2132.947964] Stack: [ 2132.948079] ffff880174707da0 ffff88017470adc0 ffff880174707dc0 ffff880174707e34 [ 2132.948474] ffff880174707eac 0000000000000000 0000000000001454 0000000000000000 [ 2132.948759] 0000000100000000 ffff880174707e40 0000000000000000 000000007fffb000 [ 2132.949048] Call Trace: [ 2132.949173] [ ] kswapd+0xea/0x1d0 [ 2132.949329] [ ] ? kswapd+0x0/0x1d0 [ 2132.949486] [ ] kthread+0x96/0xa0 [ 2132.949642] [ ] kernel_thread_helper+0x4/0x10 [ 2132.949812] [ ] ? kthread+0x0/0xa0 [ 2132.949969] [ ] ? kernel_thread_helper+0x0/0x10 [ 2132.950157] Code: ea ff ff e9 82 fc ff ff 0f 1f 44 00 00 8b bd 74 ff ff ff 48 8d 55 80 4c 89 f6 e8 06 ee ff ff 48 8b 95 38 ff ff ff be d0 00 00 00 ] balance_pgdat+0x516/0x6e0 [ 2132.950972] RSP [ 2132.951107] CR2: 0000000000000000 [ 2132.956029] ---[ end trace 674593f315e64846 ]--- With swapon it hangs the system. With swapoff it seems it continues to work.
This is very likely caused by issue #287 which we are testing a fix for now.
Excellent! Looking forward to both issues falling off the bug list.
Sam Allgood Turner Studios Atlanta, Georgia 30318
-----Original Message----- From: Brian Behlendorf [mailto:reply@reply.github.com] Sent: Wed 11/2/2011 4:42 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)
This is very likely caused by issue #287 which we are testing a fix for now.
Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-2609332
Closing issue, this should be fixed by the following two commits for #287.
zfsonlinux/zfs@6a95d0b74c2951f0dc82361ea279f64a7349f060 zfsonlinux/spl@b8b6e4c453929596b630fa1cca1ee26a532a2ab4
I seem to be having this issue. I was doing a moderately large Rsync, and had the timeouts described above.
Feb 14 03:00:53 hinoki kernel: INFO: task kswapd0:26 blocked for more than 120 seconds.
Feb 14 03:00:53 hinoki kernel: Tainted: P O 3.12.9-2-ARCH #1
Feb 14 03:00:54 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:54 hinoki kernel: kswapd0 D 0000000000000246 0 26 2 0x00000000
Feb 14 03:00:54 hinoki kernel: ffff880223e47a90 0000000000000046 00000000000144c0 ffff880223e47fd8
Feb 14 03:00:55 hinoki kernel: ffff880223e47fd8 00000000000144c0 ffff8802241622c0 ffff880223e479f8
Feb 14 03:00:55 hinoki kernel: ffffffffa0304896 ffff8801185c8830 ffff8802082d5a00 ffff880053f7aa80
Feb 14 03:00:55 hinoki kernel: Call Trace:
Feb 14 03:00:56 hinoki kernel: [<ffffffffa0304896>] ? dnode_setdirty+0x126/0x170 [zfs]
Feb 14 03:00:56 hinoki kernel: [<ffffffffa02ed7ed>] ? dbuf_dirty+0x44d/0x920 [zfs]
Feb 14 03:00:56 hinoki kernel: [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:56 hinoki kernel: [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:57 hinoki kernel: [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:57 hinoki kernel: [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:57 hinoki kernel: [<ffffffffa037fded>] zfs_zinactive+0x6d/0x150 [zfs]
Feb 14 03:00:58 hinoki kernel: [<ffffffffa0378cd4>] zfs_inactive+0x64/0x210 [zfs]
Feb 14 03:00:58 hinoki kernel: [<ffffffff811452f9>] ? truncate_pagecache+0x59/0x60
Feb 14 03:00:58 hinoki kernel: [<ffffffffa038e924>] zpl_evict_inode+0x24/0x30 [zfs]
Feb 14 03:00:58 hinoki kernel: [<ffffffff811befe0>] evict+0xb0/0x1b0
Feb 14 03:00:58 hinoki kernel: [<ffffffff811bf119>] dispose_list+0x39/0x50
Feb 14 03:00:58 hinoki kernel: [<ffffffff811c00d7>] prune_icache_sb+0x47/0x60
Feb 14 03:00:58 hinoki kernel: [<ffffffff811a839c>] super_cache_scan+0xfc/0x170
Feb 14 03:00:58 hinoki kernel: [<ffffffff811478ef>] shrink_slab+0x1df/0x3e0
Feb 14 03:00:58 hinoki kernel: [<ffffffff8114b5a6>] balance_pgdat+0x3d6/0x5f0
Feb 14 03:00:58 hinoki kernel: [<ffffffff8114b929>] kswapd+0x169/0x480
Feb 14 03:00:58 hinoki kernel: [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
Feb 14 03:00:58 hinoki kernel: [<ffffffff8114b7c0>] ? balance_pgdat+0x5f0/0x5f0
Feb 14 03:00:58 hinoki kernel: [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel: [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel: [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel: [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Is this relevant or is there another bug I should checking?
Thanks, Samuel
@ioquatix This looks different. If you're seeing this in the master code go ahead and open a new issue.
Hello, during a big rsync between two zfs filesystems my sistem hangs. I have tried to put primarycache=metadata, it only hangs later.
Here is a debug log:
Jun 2 14:41:27 backup1 kernel: [ 26.798015] ZFS: Loaded module v0.6.0.16, ZFS pool version 28, ZFS filesystem version 5 Jun 2 14:50:28 backup1 kernel: [ 594.852215] hrtimer: interrupt took 79628485 ns Jun 2 14:52:33 backup1 kernel: [ 719.889678] INFO: task txg_sync:1709 blocked for more than 120 seconds. Jun 2 14:52:33 backup1 kernel: [ 719.889795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 2 14:52:33 backup1 kernel: [ 719.889896] txg_sync D 0000000000000000 0 1709 2 0x00000000 Jun 2 14:52:33 backup1 kernel: [ 719.889901] ffff88012788bbd0 0000000000000046 ffff88012788bfd8 ffff88012788a000 Jun 2 14:52:33 backup1 kernel: [ 719.889904] 0000000000013d00 ffff8801368c83b8 ffff88012788bfd8 0000000000013d00 Jun 2 14:52:33 backup1 kernel: [ 719.889908] ffff880135890000 ffff8801368c8000 ffff88012788bbd0 ffffc90015602c68 Jun 2 14:52:33 backup1 kernel: [ 719.889911] Call Trace: Jun 2 14:52:33 backup1 kernel: [ 719.889955] [] cv_wait_common+0x77/0xd0 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.889973] [] ? autoremove_wake_function+0x0/0x40
Jun 2 14:52:33 backup1 kernel: [ 719.889980] [] cv_wait+0x13/0x20 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.890090] [] zio_wait+0xc3/0x170 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890116] [] dsl_pool_sync+0xd9/0x4a0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890136] [] spa_sync+0x3ab/0x9b0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890139] [] ? autoremove_wake_function+0x16/0x40
Jun 2 14:52:33 backup1 kernel: [ 719.890144] [] ? wake_up+0x53/0x70
Jun 2 14:52:33 backup1 kernel: [ 719.890167] [] txg_sync_thread+0x241/0x3c0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890189] [] ? txg_sync_thread+0x0/0x3c0 [zfs]
Jun 2 14:52:33 backup1 kernel: [ 719.890195] [] thread_generic_wrapper+0x78/0x90 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.890200] [] ? thread_generic_wrapper+0x0/0x90 [spl]
Jun 2 14:52:33 backup1 kernel: [ 719.890203] [] kthread+0x96/0xa0
Jun 2 14:52:33 backup1 kernel: [ 719.890210] [] kernel_thread_helper+0x4/0x10
Jun 2 14:52:33 backup1 kernel: [ 719.890213] [] ? kthread+0x0/0xa0
Jun 2 14:52:33 backup1 kernel: [ 719.890216] [] ? kernel_thread_helper+0x0/0x10
Jun 2 14:54:33 backup1 kernel: [ 839.634103] INFO: task txg_sync:1709 blocked for more than 120 seconds.
Jun 2 14:54:33 backup1 kernel: [ 839.634187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 2 14:54:33 backup1 kernel: [ 839.634286] txg_sync D 0000000000000000 0 1709 2 0x00000000
Jun 2 14:54:33 backup1 kernel: [ 839.634291] ffff88012788bbd0 0000000000000046 ffff88012788bfd8 ffff88012788a000
Jun 2 14:54:33 backup1 kernel: [ 839.634294] 0000000000013d00 ffff8801368c83b8 ffff88012788bfd8 0000000000013d00
Jun 2 14:54:33 backup1 kernel: [ 839.634298] ffff880135890000 ffff8801368c8000 ffff88012788bbd0 ffffc90015602c68
Jun 2 14:54:33 backup1 kernel: [ 839.634301] Call Trace:
Jun 2 14:54:33 backup1 kernel: [ 839.634317] [] cv_wait_common+0x77/0xd0 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634323] [] ? autoremove_wake_function+0x0/0x40
Jun 2 14:54:33 backup1 kernel: [ 839.634328] [] cv_wait+0x13/0x20 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634365] [] zio_wait+0xc3/0x170 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634384] [] dsl_pool_sync+0xd9/0x4a0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634404] [] spa_sync+0x3ab/0x9b0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634406] [] ? autoremove_wake_function+0x16/0x40
Jun 2 14:54:33 backup1 kernel: [ 839.634410] [] ? wake_up+0x53/0x70
Jun 2 14:54:33 backup1 kernel: [ 839.634431] [] txg_sync_thread+0x241/0x3c0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634451] [] ? txg_sync_thread+0x0/0x3c0 [zfs]
Jun 2 14:54:33 backup1 kernel: [ 839.634456] [] thread_generic_wrapper+0x78/0x90 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634461] [] ? thread_generic_wrapper+0x0/0x90 [spl]
Jun 2 14:54:33 backup1 kernel: [ 839.634463] [] kthread+0x96/0xa0
Jun 2 14:54:33 backup1 kernel: [ 839.634467] [] kernel_thread_helper+0x4/0x10
Jun 2 14:54:33 backup1 kernel: [ 839.634469] [] ? kthread+0x0/0xa0
Jun 2 14:54:33 backup1 kernel: [ 839.634471] [] ? kernel_thread_helper+0x0/0x10
Jun 2 14:56:33 backup1 kernel: [ 959.380537] INFO: task rsync:3180 blocked for more than 120 seconds.
Jun 2 14:56:33 backup1 kernel: [ 959.380637] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jun 2 14:56:33 backup1 kernel: [ 959.380765] rsync D 0000000000000000 0 3180 3179 0x00000000
Jun 2 14:56:33 backup1 kernel: [ 959.380771] ffff88012c685b88 0000000000000082 ffff88012c685fd8 ffff88012c684000
Jun 2 14:56:33 backup1 kernel: [ 959.380776] 0000000000013d00 ffff8801263cc858 ffff88012c685fd8 0000000000013d00
Jun 2 14:56:33 backup1 kernel: [ 959.380780] ffff880125d8c4a0 ffff8801263cc4a0 ffff88012c685ba8 ffff88012cf4c330
Jun 2 14:56:33 backup1 kernel: [ 959.380784] Call Trace:
Jun 2 14:56:33 backup1 kernel: [ 959.380804] [] cv_wait_common+0x77/0xd0 [spl]
Jun 2 14:56:33 backup1 kernel: [ 959.380811] [] ? autoremove_wake_function+0x0/0x40
Jun 2 14:56:33 backup1 kernel: [ 959.380818] [] __cv_wait+0x13/0x20 [spl]
Jun 2 14:56:33 backup1 kernel: [ 959.380866] [] txg_wait_open+0x73/0xb0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380886] [] dmu_tx_wait+0xed/0xf0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380914] [] zfs_write+0x363/0xbd0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380929] [] ? do_sync_read+0xd2/0x110
Jun 2 14:56:33 backup1 kernel: [ 959.380956] [] zpl_write_common+0x52/0x80 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380981] [] zpl_write+0x68/0xa0 [zfs]
Jun 2 14:56:33 backup1 kernel: [ 959.380984] [] vfs_write+0xc6/0x180
Jun 2 14:56:33 backup1 kernel: [ 959.380987] [] sys_write+0x51/0x90
Jun 2 14:56:33 backup1 kernel: [ 959.380991] [] system_call_fastpath+0x16/0x1b