openzfs / zfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
10.61k stars 1.75k forks source link

Hang during rsync #261

Closed mgiammarco closed 13 years ago

mgiammarco commented 13 years ago

Hello, during a big rsync between two zfs filesystems my sistem hangs. I have tried to put primarycache=metadata, it only hangs later.

Here is a debug log:

Jun 2 14:41:27 backup1 kernel: [ 26.798015] ZFS: Loaded module v0.6.0.16, ZFS pool version 28, ZFS filesystem version 5 Jun 2 14:50:28 backup1 kernel: [ 594.852215] hrtimer: interrupt took 79628485 ns Jun 2 14:52:33 backup1 kernel: [ 719.889678] INFO: task txg_sync:1709 blocked for more than 120 seconds. Jun 2 14:52:33 backup1 kernel: [ 719.889795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 2 14:52:33 backup1 kernel: [ 719.889896] txg_sync D 0000000000000000 0 1709 2 0x00000000 Jun 2 14:52:33 backup1 kernel: [ 719.889901] ffff88012788bbd0 0000000000000046 ffff88012788bfd8 ffff88012788a000 Jun 2 14:52:33 backup1 kernel: [ 719.889904] 0000000000013d00 ffff8801368c83b8 ffff88012788bfd8 0000000000013d00 Jun 2 14:52:33 backup1 kernel: [ 719.889908] ffff880135890000 ffff8801368c8000 ffff88012788bbd0 ffffc90015602c68 Jun 2 14:52:33 backup1 kernel: [ 719.889911] Call Trace: Jun 2 14:52:33 backup1 kernel: [ 719.889955] [] cv_wait_common+0x77/0xd0 [spl] Jun 2 14:52:33 backup1 kernel: [ 719.889973] [] ? autoremove_wake_function+0x0/0x40 Jun 2 14:52:33 backup1 kernel: [ 719.889980] [] cv_wait+0x13/0x20 [spl] Jun 2 14:52:33 backup1 kernel: [ 719.890090] [] zio_wait+0xc3/0x170 [zfs] Jun 2 14:52:33 backup1 kernel: [ 719.890116] [] dsl_pool_sync+0xd9/0x4a0 [zfs] Jun 2 14:52:33 backup1 kernel: [ 719.890136] [] spa_sync+0x3ab/0x9b0 [zfs] Jun 2 14:52:33 backup1 kernel: [ 719.890139] [] ? autoremove_wake_function+0x16/0x40 Jun 2 14:52:33 backup1 kernel: [ 719.890144] [] ? wake_up+0x53/0x70 Jun 2 14:52:33 backup1 kernel: [ 719.890167] [] txg_sync_thread+0x241/0x3c0 [zfs] Jun 2 14:52:33 backup1 kernel: [ 719.890189] [] ? txg_sync_thread+0x0/0x3c0 [zfs] Jun 2 14:52:33 backup1 kernel: [ 719.890195] [] thread_generic_wrapper+0x78/0x90 [spl] Jun 2 14:52:33 backup1 kernel: [ 719.890200] [] ? thread_generic_wrapper+0x0/0x90 [spl] Jun 2 14:52:33 backup1 kernel: [ 719.890203] [] kthread+0x96/0xa0 Jun 2 14:52:33 backup1 kernel: [ 719.890210] [] kernel_thread_helper+0x4/0x10 Jun 2 14:52:33 backup1 kernel: [ 719.890213] [] ? kthread+0x0/0xa0 Jun 2 14:52:33 backup1 kernel: [ 719.890216] [] ? kernel_thread_helper+0x0/0x10 Jun 2 14:54:33 backup1 kernel: [ 839.634103] INFO: task txg_sync:1709 blocked for more than 120 seconds. Jun 2 14:54:33 backup1 kernel: [ 839.634187] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 2 14:54:33 backup1 kernel: [ 839.634286] txg_sync D 0000000000000000 0 1709 2 0x00000000 Jun 2 14:54:33 backup1 kernel: [ 839.634291] ffff88012788bbd0 0000000000000046 ffff88012788bfd8 ffff88012788a000 Jun 2 14:54:33 backup1 kernel: [ 839.634294] 0000000000013d00 ffff8801368c83b8 ffff88012788bfd8 0000000000013d00 Jun 2 14:54:33 backup1 kernel: [ 839.634298] ffff880135890000 ffff8801368c8000 ffff88012788bbd0 ffffc90015602c68 Jun 2 14:54:33 backup1 kernel: [ 839.634301] Call Trace: Jun 2 14:54:33 backup1 kernel: [ 839.634317] [] cv_wait_common+0x77/0xd0 [spl] Jun 2 14:54:33 backup1 kernel: [ 839.634323] [] ? autoremove_wake_function+0x0/0x40 Jun 2 14:54:33 backup1 kernel: [ 839.634328] [] cv_wait+0x13/0x20 [spl] Jun 2 14:54:33 backup1 kernel: [ 839.634365] [] zio_wait+0xc3/0x170 [zfs] Jun 2 14:54:33 backup1 kernel: [ 839.634384] [] dsl_pool_sync+0xd9/0x4a0 [zfs] Jun 2 14:54:33 backup1 kernel: [ 839.634404] [] spa_sync+0x3ab/0x9b0 [zfs] Jun 2 14:54:33 backup1 kernel: [ 839.634406] [] ? autoremove_wake_function+0x16/0x40 Jun 2 14:54:33 backup1 kernel: [ 839.634410] [] ? wake_up+0x53/0x70 Jun 2 14:54:33 backup1 kernel: [ 839.634431] [] txg_sync_thread+0x241/0x3c0 [zfs] Jun 2 14:54:33 backup1 kernel: [ 839.634451] [] ? txg_sync_thread+0x0/0x3c0 [zfs] Jun 2 14:54:33 backup1 kernel: [ 839.634456] [] thread_generic_wrapper+0x78/0x90 [spl] Jun 2 14:54:33 backup1 kernel: [ 839.634461] [] ? thread_generic_wrapper+0x0/0x90 [spl] Jun 2 14:54:33 backup1 kernel: [ 839.634463] [] kthread+0x96/0xa0 Jun 2 14:54:33 backup1 kernel: [ 839.634467] [] kernel_thread_helper+0x4/0x10 Jun 2 14:54:33 backup1 kernel: [ 839.634469] [] ? kthread+0x0/0xa0 Jun 2 14:54:33 backup1 kernel: [ 839.634471] [] ? kernel_thread_helper+0x0/0x10 Jun 2 14:56:33 backup1 kernel: [ 959.380537] INFO: task rsync:3180 blocked for more than 120 seconds. Jun 2 14:56:33 backup1 kernel: [ 959.380637] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jun 2 14:56:33 backup1 kernel: [ 959.380765] rsync D 0000000000000000 0 3180 3179 0x00000000 Jun 2 14:56:33 backup1 kernel: [ 959.380771] ffff88012c685b88 0000000000000082 ffff88012c685fd8 ffff88012c684000 Jun 2 14:56:33 backup1 kernel: [ 959.380776] 0000000000013d00 ffff8801263cc858 ffff88012c685fd8 0000000000013d00 Jun 2 14:56:33 backup1 kernel: [ 959.380780] ffff880125d8c4a0 ffff8801263cc4a0 ffff88012c685ba8 ffff88012cf4c330 Jun 2 14:56:33 backup1 kernel: [ 959.380784] Call Trace: Jun 2 14:56:33 backup1 kernel: [ 959.380804] [] cv_wait_common+0x77/0xd0 [spl] Jun 2 14:56:33 backup1 kernel: [ 959.380811] [] ? autoremove_wake_function+0x0/0x40 Jun 2 14:56:33 backup1 kernel: [ 959.380818] [] __cv_wait+0x13/0x20 [spl] Jun 2 14:56:33 backup1 kernel: [ 959.380866] [] txg_wait_open+0x73/0xb0 [zfs] Jun 2 14:56:33 backup1 kernel: [ 959.380886] [] dmu_tx_wait+0xed/0xf0 [zfs] Jun 2 14:56:33 backup1 kernel: [ 959.380914] [] zfs_write+0x363/0xbd0 [zfs] Jun 2 14:56:33 backup1 kernel: [ 959.380929] [] ? do_sync_read+0xd2/0x110 Jun 2 14:56:33 backup1 kernel: [ 959.380956] [] zpl_write_common+0x52/0x80 [zfs] Jun 2 14:56:33 backup1 kernel: [ 959.380981] [] zpl_write+0x68/0xa0 [zfs] Jun 2 14:56:33 backup1 kernel: [ 959.380984] [] vfs_write+0xc6/0x180 Jun 2 14:56:33 backup1 kernel: [ 959.380987] [] sys_write+0x51/0x90 Jun 2 14:56:33 backup1 kernel: [ 959.380991] [] system_call_fastpath+0x16/0x1b

behlendorf commented 13 years ago

It looks like both processes are waiting on an open the open transaction, but it's not clear why. With what version are you able to reproduce this? Fixes for several deadlocks have been commited post -rc4. You can give then a try if you pull the latest master.

mgiammarco commented 13 years ago

Version is on first line of log, it is the latest in ppa.

Il giorno 02/giu/2011 18:13, "behlendorf" < reply@reply.github.com> ha scritto:

It looks like both processes are waiting on an open the open transaction, but it's not clear why. With what version are you able to reproduce this? Fixes for several deadlocks have been commited post -rc4. You can give then a try if you pull the latest master.

Reply to this email directly or view it on GitHub: https://github.com/behlendorf/zfs/issues/261#comment_1281571

mgiammarco commented 13 years ago

I have disabled l2arc and zil log. I have redone the rsync test. Unfortunately it fails again. But this time it fails due to outofmemory killer.

behlendorf commented 13 years ago

How much memory is in your system? At the moment things get a little dicey with ZFS if you have less than about 2GB.

mgiammarco commented 13 years ago

The server was on esxi so I gave it 5gb this time. I repeat the settings:

Result: zfs eats all memory again this time after more time. It is strange that for first hours memory consumption seems normal, then it goes crazy and eats all memory.

I use latest version from ppa.

ahamilto commented 13 years ago

I get a similar issue on a testing backup server when using rsync. Interestingly, the problem does not occur when the heavy I/O is generated by a non-rsync process (specifically, the OpenAFS backup server). The server in question has 7GB of RAM for a RAIDz pool of 4x 750GB drives. Compression is enabled on the pool but deduplication is disabled. Kernel is v2.6.36 checked out from kernel.org.

Please let me know if you need any more information.

behlendorf commented 13 years ago

Several deadlocks which would result in the symptoms you describe were recently fixed in the master branch. If your able to consistently reproduce this issue I'd appreciate it if you could try the latest source from github and let me know if the issue remains. Thanks!

a140dc5469399eaf2347ee7db503275f38f5dc12 - Fix mmap(2)/write(2)/read(2) deadlock abd39a8289c3bc2766f281933689d5140445aa85 - Fix zio_execute() deadlock cfc9a5c88f91f7b4d606fce89505e1f404691ea5 - Fix zpl_writepage() deadlock

mgiammarco commented 13 years ago

Thanks I will try when these patches will go in the ppa repository. I write now to say that I have found (by accident) a workaround: in the virtual machine that I us for tests I have found that I have put two virtual cpus. Now I have put only one cpu and it seems it works (when memory is full now memory is reclaimed and no task killing occours).

I suppose the deadlocks you have found will help with 2 cpus.

2011/7/20 behlendorf < reply@reply.github.com>

Several deadlocks which would result in the symptoms you describe where recently fixed in the master branch. If your able to consistently reproduce this issue I'd appreciate it if you could try the latest source from github and let me know if the issue remains. Thanks!

a140dc5469399eaf2347ee7db503275f38f5dc12 - Fix mmap(2)/write(2)/read(2) deadlock abd39a8289c3bc2766f281933689d5140445aa85 - Fix zio_execute() deadlock cfc9a5c88f91f7b4d606fce89505e1f404691ea5 - Fix zpl_writepage() deadlock

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1617181

tstudios commented 13 years ago

4 rsync streams between this linux 233TB pool and one solaris 233TB zfs pool grinds system to a halt. Work around seems to be a taskset command that assigns rsync to 2 physical processors of a 4 physical processor system. After that, open ssh user interfaces will randomly hang for short periods; then recover. Obviously user interactive is peocess scheduled randomly. AMD Opteron dual core, 4 physical cpus, 8gb memory; CentOS 6 kernel update 2.6.32-71.29.1.el6.x86_64, spl and zfs 0.6.0-rc5. I realize none of this is technical but rsyncs are in progress, I'm home on a VPN. Rsync hang is an open bug for the slightly past due rc6 and/or 0.6.0 stable release. FYI - We do have reasonf for doing zfs send/receive duplications. The CentOS system can dual boot Solaris, i.e. its original OS.

tstudios commented 13 years ago

Typos in last comment. Should read "We do have reasons for not doing zfs send/receive duplications." Rsyncs are running. Currently set to run on processor #3 via taskset. Interactive processes do recover after hangs. Did have open 10 ssh sessions today. A date, sleep 1 loop was running in all shells. Generally, only one shell would hang at a time... perhaps between 2 and 15 seconds.

ahamilto commented 13 years ago

I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.

mgiammarco commented 13 years ago

I agree. Putting a very small zfs_arc_max slow down the hanging of the system. It seems that something during rsync wake up a memory leak.

2011/8/2 Alphalead < reply@reply.github.com>

I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1709328

tstudios commented 13 years ago

Not in a position to unload ZFS between rsyncs. Other user processes need/use the pool. Just reporting what I, as an "ordinary" interactive user would see. Re-enforcing that the bug does exist and is noted for a fix in rc6. When fixed, it seems that rc6 and 0.6.0 will be stable enough for us to replace the Solaris OS. Looking to get ZFS and FUSE/LTFS (Linear Tape File System) on the same OS platform. We are looking to off load "deep/old" archived projects onto LTFS. Again we have reasons not to do zfs send/receive to/from tape as we have reasons not to do zfs send/receive between stand alone systems/pools.

mgiammarco commented 13 years ago

Anyway I have done a tar reading files and writing to /dev/null and the system hangs, so problem is not due to disk writes.

2011/8/2 Mario Giammarco mgiammarco@gmail.com

I agree. Putting a very small zfs_arc_max slow down the hanging of the system. It seems that something during rsync wake up a memory leak.

2011/8/2 Alphalead < reply@reply.github.com>

I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1709328

behlendorf commented 13 years ago

Can you please post a simple test case which can recreate this issue. While this is tagged to be fixed for 0.6.0-rc6 I haven't seriously investigated it yet. A test case would help a lot.

tstudios commented 13 years ago

Happy to try and help. I'm a sysadmin and not a programmer in the technical sense. Do you need my grouping of rsync commands, my multi open shell interactive test loop, or a more technical detail? For any debug and/or trace output, a cdl command string would be helpful.

tstudios

Sam Allgood Sr. Production Engineer Turner Studios Turner Broadcasting System Atlanta, GA 30318

-----Original Message----- From: behlendorf [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 7:04 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)

Can you please post a simple test case which can recreate this issue. While this is tagged to be fixed for 0.6.0-rc6 I haven't seriously investigated it yet. A test case would help a lot.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1714048

behlendorf commented 13 years ago

If you can provide me enough information to recreate the issue locally that would be ideal.

tstudios commented 13 years ago

While not on this system, now dual booting Solaris and Linux, another identical system (still on Solaris) will be sharing zfs file systems within the pool via Samba. It would be interesting to know if a large single file read via Samba will also hang ZFS. Don't want to clutter the Bug Comments too much here. Seems there are two cases that can hang.

tstudios Sam Allgood Turner Studios

-----Original Message----- From: mgiammarco [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 6:11 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)

Anyway I have done a tar reading files and writing to /dev/null and the system hangs, so problem is not due to disk writes.

2011/8/2 Mario Giammarco mgiammarco@gmail.com

I agree. Putting a very small zfs_arc_max slow down the hanging of the system. It seems that something during rsync wake up a memory leak.

2011/8/2 Alphalead < reply@reply.github.com>

I am still seeing this issue with the latest versions built from git. Setting the zfs_arc_max option to a relatively small number (1GB on a system with 6GB RAM in this case) helps to slow the failure. When running I/O intensive loads that aren't an rsync (the AFS backup system that I'm running on the system is writing 250GB to a single large file), the arc size as given by /proc/spl/kstat/zfs/arcstats quickly goes up to 1073845840 (almost exactly the value given to modprobe) and then holds at that size. When running an rsync however, that size climbs steadily through the maximum value specified (even if the rsync is just verifying that files have not changed) and keeps increasing until the system starts swapping at which point it begins to lag badly. A workaround we have in place for the moment is to rsync a portion at a time and completely unload ZFS between the rsyncs which frees the memory.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1709328

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1713675

tstudios commented 13 years ago

It's late here EDT. Tomorrow, I'll give you my exact commands from a daily running script (real values and not $variables) and the rsync reports of how much data has moved, bytes sent/received, etc. That plus dmesg, kernel version.

I'm running taskset -c rsync . This is allowing recovery. Should I not taskset and let the system grind to a halt?

I do see recent comments about tar reads from zfs pools to /dev/null also hanging.

tstudios

-----Original Message----- From: behlendorf [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 7:28 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)

If you can provide me enough information to recreate the issue locally that would be ideal.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1714207

tstudios commented 13 years ago

Brian,

Below is a synopsis of yesterdays rsyncs. Hopefully, enough information to Roughly duplicate my user setup. Hopefully, bad line wraps are not too annoying. This was pasted in from a simple text editor

tstudios - Sam Allgood Turner Studios

Kernel is stock from the latest yum update: [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat /etc/redhat-release CentOS Linux release 6.0 (Final [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# uname -rv 2.6.32-71.29.1.el6.x86_64 #1 SMP Mon Jun 27 19:49:27 BST 2011

Rsync version: [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# rsync --version rsync version 3.0.6 protocol version 30 Copyright (C) 1996-2009 by Andrew Tridgell, Wayne Davison, and others. Web site: http://rsync.samba.org/ Capabilities: 64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints, socketpairs, hardlinks, symlinks, IPv6, batchfiles, inplace, append, ACLs, xattrs, iconv, symtimes

Running the following in 10 or so ssh login shells. I can see one or more of these hang randomly for random lengths: while true; do date; sleep 1; done

The task is to sync a Solaris zfs pool with 4 file systems (besides the root pool) to this Linux zfs pool with 4 file systems (besides the root pool). This process has been running daily for a couple of years now. The Linux box now dual boots Linux/Solaris. The sync commands originate on the Linux system over a 10G link with a 9000 mtu. There is about a minute offset between each command. These did execute With a taskset -c 3. If I let the system schedule itself, I soon loose it.

The zfs list below comes from the Linux system and is indicative of the pool and file systems on both Linux and Solaris. Just so happens yesterday that my largest file system had the least data churn. Spent 4 hours parsing that file system. All file systems are a mixture of files from small text files to nearly 1TB archives from a compositing application Autocad Discreet Flame. The parse times from my log for that least churn/ largest filesystem: [root@tsdpl-bu tsdpl-A-10G-fx_archive]# head -2 log.0 SYSTEM:tsdpl-A-10G-fx_archive START:2011-08-02-12-02-02 [root@tsdpl-bu tsdpl-A-10G-fx_archive]# tail -1 log.0 END:2011-08-02-15-59-19

Hmmnnn.."...first error in dmesg? Presuming dmesg, it looks like a fairly standard boot. Managing pool import, legacy mount, ifs and samba from a custom script. Only import and legacy mownt for now

Okay, now the ifs list, sync arguments, sync summary, and an indication of files added/deleted. Note one file system is a legacy mount. Note the pool "root" is not rsynced. For the --excludes, SNAPSHOT is a Symbolic link to .zfs, and I don't/can't pull those across.

NAME USED AVAIL REFER MOUNTPOINT zdev001 206T 26.3T 22K /tsdpl zdev001/backups 52.6T 26.3T 52.6T /tsdpl/backups zdev001/fx_archive 142T 26.3T 142T /tsdpl/fx_archive zdev001/gfx01 10.1T 26.3T 10.1T legacy zdev001/gfx_archive 1.72T 26.3T 1.72T /tsdpl/gfx_archive


taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/tsdpl/backups/ /tsdpl/backups

sent 13798995 bytes received 190547787390 bytes 9751136.57 bytes/sec total size is 57435529244802 speedup is 301.40

[root@tsdpl-bu tsdpl-A-10G-backups]# cat log.0 | grep deleting | wc -l 991 [root@tsdpl-bu tsdpl-A-10G-backups]# cat log.0 | grep -v deleting | wc -l 66126


taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude .zfs tsdpl::root/tsdpl/fx_archive/ /tsdpl/fx_archive

receiving incremental file list system/cron_archive_log.txt system/cron_catalog_log.txt

sent 276728 bytes received 268401036 bytes 18871.13 bytes/sec total size is 155228895891957 speedup is 577751.18


taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/gfx01/ /gfx01

sent 1662192 bytes received 83044609836 bytes 10524842.79 bytes/sec total size is 11025515851641 speedup is 132.76

[root@tsdpl-bu tsdpl-A-10G-gfx01]# cat log.0 | grep deleting | wc -l 289446 [root@tsdpl-bu tsdpl-A-10G-gfx01]# cat log.0 | grep -v deleting | wc -l 85258


taskset -c 3 rsync -av --delete-during --sockopts=SO_SNDBUF=65536,SO_RCVBUF=65536,TCP_NODELAY,IPTOS_LOWDELAY --block-size=65536 --exclude SNAPSHOT --exclude .zfs tsdpl::root/tsdpl/gfx_archive/ /tsdpl/gfx_archive

sent 84485 bytes received 1411890863 bytes 1125528.38 bytes/sec total size is 1875081239999 speedup is 1327.98

[root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat log.0 | grep deleting | wc -l 0 [root@tsdpl-bu tsdpl-A-10G-gfx_archive]# cat log.0 | grep -v deleting | wc -l 372 -----Original Message----- From: behlendorf [mailto:reply+i-989806-0cb257256b3142fe8b76140c29a6e6dc443a6971@reply.gi thub.com] Sent: Tuesday, August 02, 2011 7:28 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)

If you can provide me enough information to recreate the issue locally that would be ideal.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1714207

ahamilto commented 13 years ago

rsyncing 1.5TB of data in a few large files (20 files total, ranging in size from 17GB to 140GB compressed; 18GB to 232GB uncompressed) is an example of an rsync that runs fine with no memory issues.

rsyncing 60GB of data in ~2 million files (mail storage backups) will result in the arc cache ballooning beyond the defined maximum. The call stack when the system starts swapping is:

[134432.390032] Pid: 25143, comm: kworker/u:1 Tainted: P 2.6.36 #7 [134432.390036] Call Trace: [134432.390050] [] alloc_pages_nodemask+0x5bb/0x647 [134432.390058] [] ? autoremove_wake_function+0x0/0x34 [134432.390066] [] tcp_sendmsg+0x3fe/0x898 [134432.390073] [] inet_sendmsg+0x8f/0x98 [134432.390081] [] ? selinux_socket_sendmsg+0x1e/0x20 [134432.390088] [] sock_sendmsg+0x66/0x73 [134432.390093] [] sock_sendmsg+0xa3/0xbc [134432.390106] [] ? iscsi_tcp_segment_done+0x123/0x2aa [libiscsi_tcp] [134432.390112] [] kernel_sendmsg+0x32/0x3e [134432.390119] [] iscsi_sw_tcp_pdu_xmit+0xdf/0x236 [iscsi_tcp] [134432.390126] [] iscsi_tcp_task_xmit+0x34/0x236 [libiscsi_tcp] [134432.390134] [] ? dequeue_entity+0x2e/0x33 [134432.226918] 1330890 pages non-shared [134432.390026] kworker/u:1: page allocation failure. order:0, mode:0x20 [134432.390032] Pid: 25143, comm: kworker/u:1 Tainted: P 2.6.36 #7 [134432.390036] Call Trace: [134432.390050] [] alloc_pages_nodemask+0x5bb/0x647 [134432.390058] [] ? autoremove_wake_function+0x0/0x34 [134432.390066] [] tcp_sendmsg+0x3fe/0x898 [134432.390073] [] inet_sendmsg+0x8f/0x98 [134432.390081] [] ? selinux_socket_sendmsg+0x1e/0x20 [134432.390088] [] sock_sendmsg+0x66/0x73 [134432.390093] [] sock_sendmsg+0xa3/0xbc [134432.390106] [] ? iscsi_tcp_segment_done+0x123/0x2aa [libiscsi_tcp] [134432.390112] [] kernel_sendmsg+0x32/0x3e [134432.390119] [] iscsi_sw_tcp_pdu_xmit+0xdf/0x236 [iscsi_tcp] [134432.390126] [] iscsi_tcp_task_xmit+0x34/0x236 [libiscsi_tcp] [134432.390134] [] ? __dequeue_entity+0x2e/0x33 [134432.390139] [] ? set_next_entity+0x23/0x73 [134432.390146] [] ? switch_to+0xea/0x212 [134432.390157] [] iscsi_xmit_task+0x52/0x94 [libiscsi] [134432.390165] [] iscsi_xmitworker+0x1c2/0x2b9 [libiscsi] [134432.390173] [] ? bond_3ad_state_machine_handler+0x865/0x8b0 [134432.390181] [] ? iscsi_xmitworker+0x0/0x2b9 [libiscsi] [134432.390188] [] process_one_work+0x1d9/0x2e7 [134432.390194] [] worker_thread+0x133/0x246 [134432.390199] [] ? worker_thread+0x0/0x246 [134432.390204] [] kthread+0x7d/0x85 [134432.390211] [] kernel_thread_helper+0x4/0x10 [134432.390216] [] ? kthread+0x0/0x85 [134432.390221] [] ? kernel_thread_helper+0x0/0x10 [134432.390224] Mem-Info: [134432.390227] DMA per-cpu: [134432.390230] CPU 0: hi: 0, btch: 1 usd: 0 [134432.390234] CPU 1: hi: 0, btch: 1 usd: 0 [134432.390237] CPU 2: hi: 0, btch: 1 usd: 0 [134432.390241] CPU 3: hi: 0, btch: 1 usd: 0 [134432.390244] DMA32 per-cpu: [134432.390247] CPU 0: hi: 186, btch: 31 usd: 0 [134432.390251] CPU 1: hi: 186, btch: 31 usd: 0 [134432.390254] CPU 2: hi: 186, btch: 31 usd: 0 [134432.390257] CPU 3: hi: 186, btch: 31 usd: 0 [134432.390260] Normal per-cpu: [134432.390263] CPU 0: hi: 186, btch: 31 usd: 0 [134432.390267] CPU 1: hi: 186, btch: 31 usd: 6 [134432.390270] CPU 2: hi: 186, btch: 31 usd: 0 [134432.390274] CPU 3: hi: 186, btch: 31 usd: 1 [134432.390282] active_anon:1244 inactive_anon:381 isolated_anon:0 [134432.390283] active_file:225 inactive_file:536 isolated_file:0 [134432.390285] unevictable:850 dirty:0 writeback:0 unstable:0 [134432.390286] free:6756 slab_reclaimable:982 slab_unreclaimable:187430 [134432.390288] mapped:764 shmem:6 pagetables:1126 bounce:0 [134432.390300] DMA free:15852kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:64kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [134432.390308] lowmem_reserve[]: 0 3512 5974 5974 [134432.390322] DMA32 free:10632kB min:5804kB low:7252kB high:8704kB active_anon:0kB inactive_anon:96kB active_file:36kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596460kB mlocked:0kB dirty:0kB writeback:0kB mapped:60kB shmem:0kB slab_reclaimable:72kB slab_unreclaimable:449936kB kernel_stack:288kB pagetables:1136kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [134432.390330] lowmem_reserve[]: 0 0 2461 2461 [134432.390343] Normal free:540kB min:4068kB low:5084kB high:6100kB active_anon:4976kB inactive_anon:1428kB active_file:864kB inactive_file:2120kB unevictable:3400kB isolated(anon):0kB isolated(file):0kB present:2520956kB mlocked:3400kB dirty:0kB writeback:0kB mapped:2996kB shmem:24kB slab_reclaimable:3856kB slab_unreclaimable:299720kB kernel_stack:2448kB pagetables:3368kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:67 all_unreclaimable? no [134432.390356] lowmem_reserve[]: 0 0 0 0 [134432.390362] DMA: 7_4kB 6_8kB 6_16kB 6_32kB 4_64kB 5_128kB 5_256kB 2_512kB 2_1024kB 1_2048kB 2_4096kB = 15852kB [134432.390377] DMA32: 76_4kB 245_8kB 81_16kB 85_32kB 38_64kB 11_128kB 2_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 10632kB [134432.390391] Normal: 113_4kB 11_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0*4096kB = 540kB [134432.390406] 1760 total pagecache pages [134432.390409] 527 pages in swap cache [134432.390413] Swap cache stats: add 1779199, delete 1778672, find 414116/728800 [134432.390416] Free swap = 1004328kB [134432.390419] Total swap = 1048572kB [134432.426523] 1556478 pages RAM [134432.426527] 44442 pages reserved [134432.426529] 5987 pages shared [134432.426532] 1330890 pages non-shared [134432.713636] kworker/u:1: page allocation failure. order:0, mode:0x20

behlendorf commented 13 years ago

@Alphalead: It certainly sounds like the meta data limits aren't being enforced properly for the arc, have you checked the /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab when it's in this state. I'd be interested to see what they report, and where the memory is being used.

bill-mcgonigle commented 13 years ago

I seem to be hitting this problem too. Just default values on everything (I'll implement the suggestions above) except for zpool being created with ashift=12.

Here are /proc/spl/kstat/zfs/arcstats and /proc/spl/kmem/slab with the machine effectively useless. rsync was copying from local ext4 to local zfs. Load average was about 18.5 but there was no I/O to speak of. There was 6900M or so of RAM allocated to this VM (a Xen dom0) with htop reporting ~5900 in use (but not by the applications, they can barely use 2GB). Killing rsync didn't help, and when I forced the machine to reboot I saw an rsync process segfault right before the actual system reboot (perhaps the open files got pulled out from under it?)

3 1 0x01 61 2928 146583166654 245155398080153
name                            type data
hits                            4    38630741
misses                          4    11918394
demand_data_hits                4    1051214
demand_data_misses              4    210641
demand_metadata_hits            4    32812696
demand_metadata_misses          4    10711823
prefetch_data_hits              4    44925
prefetch_data_misses            4    112506
prefetch_metadata_hits          4    4721906
prefetch_metadata_misses        4    883424
mru_hits                        4    2313878
mru_ghost_hits                  4    1762080
mfu_hits                        4    31550033
mfu_ghost_hits                  4    9489768
deleted                         4    12505789
recycle_miss                    4    47548198
mutex_miss                      4    77617
evict_skip                      4    5325335645
evict_l2_cached                 4    0
evict_l2_eligible               4    894196381184
evict_l2_ineligible             4    9847328256
hash_elements                   4    576798
hash_elements_max               4    576799
hash_collisions                 4    7084752
hash_chains                     4    157562
hash_chain_max                  4    8
p                               4    360042944
c                               4    2180250624
c_min                           4    2180250624
c_max                           4    17442004992
size                            4    2819764592
hdr_size                        4    184016480
data_size                       4    662532096
other_size                      4    1973216016
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_hdr_miss              4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_hdr_size                     4    0
memory_throttle_count           4    9491
memory_direct_count             4    59
memory_indirect_count           4    321540
arc_no_grow                     4    1
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_meta_used                   4    2819764592
arc_meta_limit                  4    4360501248
arc_meta_max                    4    3733333912
--------------------- cache -------------------------------------------------------  ----- slab ------  ---- object -----
name                                    flags      size     alloc slabsize  objsize  total alloc   max  total alloc   max
spl_vn_cache                          0x00020         0         0     4096       88      0     0     7      0     0   128
spl_vn_file_cache                     0x00020         0         0     4096       88      0     0     0      0     0     0
spl_zlib_workspace_cache              0x00040         0         0  8388608   268104      0     0     0      0     0     0
zio_cache                             0x00040    360448    333792    16384      912     22    22  1956    374   366 33252
zio_link_cache                        0x00020     36864     14304     4096       48      9     9   749    405   298 33664
zio_buf_512                           0x00042 1372160000 643706368    32768      512  41875 41875 50625  1298125 1257239 1569352
zio_data_buf_512                      0x00042         0         0    32768      512      0     0   843      0     0 26110
zio_buf_1024                          0x00042    131072     65536    32768     1024      4     4   448     84    64  9407
zio_data_buf_1024                     0x00042         0         0    32768     1024      0     0   965      0     0 20263
zio_buf_1536                          0x00042    196608     98304    65536     1536      3     3   164     93    64  5084
zio_data_buf_1536                     0x00042         0         0    65536     1536      0     0  1386      0     0 42947
zio_buf_2048                          0x00042    196608    149504    65536     2048      3     3   175     75    73  4359
zio_data_buf_2048                     0x00042         0         0    65536     2048      0     0  3525      0     0 88125
zio_buf_2560                          0x00042    262144    163840    65536     2560      4     4   254     84    64  5324
zio_data_buf_2560                     0x00042         0         0    65536     2560      0     0  6054      0     0 127120
zio_buf_3072                          0x00042         0         0    65536     3072      0     0   356      0     0  6400
zio_data_buf_3072                     0x00042         0         0    65536     3072      0     0  3637      0     0 65464
zio_buf_3584                          0x00042         0         0   131072     3584      0     0    46      0     0  1408
zio_data_buf_3584                     0x00042         0         0   131072     3584      0     0   466      0     0 14435
zio_buf_4096                          0x00042   1048576    262144   262144     4096      4     4   355    124    64 11005
zio_data_buf_4096                     0x00042         0         0   262144     4096      0     0   551      0     0 17073
zio_buf_5120                          0x00042    262144    163840   131072     5120      2     2    80     42    32  1661
zio_data_buf_5120                     0x00042         0         0   131072     5120      0     0  1458      0     0 30598
zio_buf_6144                          0x00042    262144    190464   131072     6144      2     2    57     36    31  1023
zio_data_buf_6144                     0x00042         0         0   131072     6144      0     0   819      0     0 14738
zio_buf_7168                          0x00042    786432    458752   262144     7168      3     3    25     93    64   767
zio_data_buf_7168                     0x00042         0         0   262144     7168      0     0   225      0     0  6955
zio_buf_8192                          0x00042    786432    425984   262144     8192      3     3   154     63    52  3218
zio_data_buf_8192                     0x00042   1048576    524288   262144     8192      4     4   447     84    64  9378
zio_buf_10240                         0x00042         0         0   262144    10240      0     0    39      0     0   804
zio_data_buf_10240                    0x00042         0         0   262144    10240      0     0   181      0     0  3797
zio_buf_12288                         0x00042   1048576    454656   524288    12288      2     2    24     62    37   722
zio_data_buf_12288                    0x00042         0         0   524288    12288      0     0    79      0     0  2424
zio_buf_14336                         0x00042         0         0   524288    14336      0     0    10      0     0   305
zio_data_buf_14336                    0x00042         0         0   524288    14336      0     0    51      0     0  1555
zio_buf_16384                         0x00042 1329070080 663404544   524288    16384   2535  2535  6746  63375 40491 168650
zio_data_buf_16384                    0x00042         0         0   524288    16384      0     0    83      0     0  2071
zio_buf_20480                         0x00042   2097152   1310720   524288    20480      4     4    40     84    64   829
zio_data_buf_20480                    0x00042         0         0   524288    20480      0     0   190      0     0  3981
zio_buf_24576                         0x00042   1572864   1327104   524288    24576      3     3    31     54    54   548
zio_data_buf_24576                    0x00042         0         0   524288    24576      0     0   104      0     0  1864
zio_buf_28672                         0x00042         0         0  1048576    28672      0     0    12      0     0   366
zio_data_buf_28672                    0x00042         0         0  1048576    28672      0     0    66      0     0  2033
zio_buf_32768                         0x00042   3145728   2097152  1048576    32768      3     3    12     84    64   320
zio_data_buf_32768                    0x00042         0         0  1048576    32768      0     0    81      0     0  2244
zio_buf_36864                         0x00042   2097152   1179648  1048576    36864      2     2    13     50    32   323
zio_data_buf_36864                    0x00042         0         0  1048576    36864      0     0   104      0     0  2599
zio_buf_40960                         0x00042   1048576         0  1048576    40960      1     0    15     23     0   338
zio_data_buf_40960                    0x00042         0         0  1048576    40960      0     0    30      0     0   689
zio_buf_45056                         0x00042         0         0  1048576    45056      0     0    11      0     0   231
zio_data_buf_45056                    0x00042         0         0  1048576    45056      0     0    32      0     0   652
zio_buf_49152                         0x00042   4194304   3145728  1048576    49152      4     4    10     76    64   173
zio_data_buf_49152                    0x00042         0         0  1048576    49152      0     0    38      0     0   705
zio_buf_53248                         0x00042         0         0  1048576    53248      0     0    23      0     0   403
zio_data_buf_53248                    0x00042         0         0  1048576    53248      0     0    37      0     0   664
zio_buf_57344                         0x00042         0         0  1048576    57344      0     0    19      0     0   308
zio_data_buf_57344                    0x00042         0         0  1048576    57344      0     0    38      0     0   642
zio_buf_61440                         0x00042         0         0  2097152    61440      0     0     7      0     0   187
zio_data_buf_61440                    0x00042         0         0  2097152    61440      0     0    18      0     0   532
zio_buf_65536                         0x00042  39845888  31653888  2097152    65536     19    19    22    570   483   641
zio_data_buf_65536                    0x00042         0         0  2097152    65536      0     0    14      0     0   420
zio_buf_69632                         0x00042         0         0  2097152    69632      0     0    12      0     0   320
zio_data_buf_69632                    0x00042         0         0  2097152    69632      0     0    14      0     0   384
zio_buf_73728                         0x00042         0         0  2097152    73728      0     0    11      0     0   286
zio_data_buf_73728                    0x00042         0         0  2097152    73728      0     0     9      0     0   232
zio_buf_77824                         0x00042         0         0  2097152    77824      0     0    12      0     0   288
zio_data_buf_77824                    0x00042         0         0  2097152    77824      0     0     8      0     0   200
zio_buf_81920                         0x00042         0         0  2097152    81920      0     0     7      0     0   161
zio_data_buf_81920                    0x00042         0         0  2097152    81920      0     0    11      0     0   245
zio_buf_86016                         0x00042         0         0  2097152    86016      0     0     6      0     0   133
zio_data_buf_86016                    0x00042         0         0  2097152    86016      0     0     9      0     0   206
zio_buf_90112                         0x00042         0         0  2097152    90112      0     0     6      0     0   128
zio_data_buf_90112                    0x00042         0         0  2097152    90112      0     0     9      0     0   188
zio_buf_94208                         0x00042         0         0  2097152    94208      0     0     8      0     0   155
zio_data_buf_94208                    0x00042         0         0  2097152    94208      0     0    13      0     0   263
zio_buf_98304                         0x00042         0         0  2097152    98304      0     0    17      0     0   324
zio_data_buf_98304                    0x00042         0         0  2097152    98304      0     0    13      0     0   244
zio_buf_102400                        0x00042         0         0  2097152   102400      0     0    12      0     0   223
zio_data_buf_102400                   0x00042         0         0  2097152   102400      0     0    10      0     0   186
zio_buf_106496                        0x00042         0         0  2097152   106496      0     0    16      0     0   288
zio_data_buf_106496                   0x00042         0         0  2097152   106496      0     0    12      0     0   212
zio_buf_110592                        0x00042         0         0  2097152   110592      0     0     9      0     0   161
zio_data_buf_110592                   0x00042         0         0  2097152   110592      0     0    12      0     0   211
zio_buf_114688                        0x00042         0         0  2097152   114688      0     0    12      0     0   192
zio_data_buf_114688                   0x00042         0         0  2097152   114688      0     0    10      0     0   156
zio_buf_118784                        0x00042         0         0  2097152   118784      0     0     6      0     0    97
zio_data_buf_118784                   0x00042         0         0  2097152   118784      0     0    12      0     0   191
zio_buf_122880                        0x00042         0         0  2097152   122880      0     0     7      0     0    97
zio_data_buf_122880                   0x00042         0         0  2097152   122880      0     0    16      0     0   256
zio_buf_126976                        0x00042         0         0  4194304   126976      0     0     4      0     0    98
zio_data_buf_126976                   0x00042         0         0  4194304   126976      0     0     6      0     0   162
zio_buf_131072                        0x00042         0         0  4194304   131072      0     0    20      0     0   593
zio_data_buf_131072                   0x00042         0         0  4194304   131072      0     0   816      0     0 25268
sa_cache                              0x00020  50860032  28278800     4096       80  12417 12417 19334  409761 353485 637997
dnode_t                               0x00040 1327349760 1186980880    16384      944  81015 81015 96886  1296240 1257395 1550176
dmu_buf_impl_t                        0x00020 485531648 384151168     8192      296  59269 59269 68442  1422456 1297808 1642587
arc_buf_hdr_t                         0x00020 196968448 166191840     8192      288  24044 24044 24044  577056 577055 577055
arc_buf_t                             0x00020  25071616   4558288     4096      112   6121  6121 14313  159146 40699 372126
zil_lwb_cache                         0x00020     65536     49536     4096      192     16    16   192    272   258  3253
zfs_znode_cache                       0x00020 411664384 342173480    16384      968  25126 25126 39873  402016 353485 637968
behlendorf commented 13 years ago

Interesting. So it looks like the bulk of the memory is being consumed by meta data in the ARC. It also looks like memory is getting fragmented, but it's not horrible yet. What's unclear is why all of those dnodes, inodes, and meta-data buffers aren't being released to the system. They should be unless they are still considered to be in use. Have you tried forcing the caches to be dropped with drop_caches? This will cause everything which is unused to be reclaimed, which should be happening anyway. Also, have you seen this issue with an other apps besides rsync? Say a find in the directory or something else which accesses all of these files.

echo 1 >/proc/sys/vm/drop_caches
mgiammarco commented 13 years ago

I had the same problem with a tar that reads from zfs and writes to /dev/null Il giorno 19/ago/2011 20:04, "behlendorf" < reply@reply.github.com> ha scritto:

Interesting. So it looks like the bulk of the memory is being consumed by meta data in the ARC. It also looks like memory is getting fragmented, but it's not horrible yet. What's unclear is why all of those dnodes, inodes, and meta-data buffers aren't being released to the system. They should be unless they are still considered to be in use. Have you tried forcing the caches to be dropped with drop_caches? This will cause everything which is unused to be reclaimed, which should be happening anyway. Also, have you seen this issue with an other apps besides rsync? Say a find in the directory or something else which accesses all of these files.

echo 1 >/proc/sys/vm/drop_caches

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-1854419

bill-mcgonigle commented 13 years ago

OK, so I did:

find /backup -ls > /dev/null

It ate about 4GB of RAM, I killed it when there was about a gig left.

I have, in /etc/modprobe.d/zfs.conf:

options zfs zfs_arc_max=107374182 zfs_arc_meta_limit=107374182 zfs_arc_max=107374182

The filesystem options are:

# zfs get all backup/snapshots| grep -v default
NAME              PROPERTY              VALUE                  SOURCE
backup/snapshots  type                  filesystem             -
backup/snapshots  creation              Sun Aug 14 17:51 2011  -
backup/snapshots  used                  597G                   -
backup/snapshots  available             1.20T                  -
backup/snapshots  referenced            597G                   -
backup/snapshots  compressratio         1.13x                  -
backup/snapshots  mounted               yes                    -
backup/snapshots  compression           on                     inherited from backup
backup/snapshots  version               5                      -
backup/snapshots  utf8only              off                    -
backup/snapshots  normalization         none                   -
backup/snapshots  casesensitivity       sensitive              -
backup/snapshots  primarycache          metadata               local
backup/snapshots  usedbysnapshots       0                      -
backup/snapshots  usedbydataset         597G                   -
backup/snapshots  usedbychildren        0                      -
backup/snapshots  usedbyrefreservation  0                      -
echo 1 >/proc/sys/vm/drop_caches

didn't seem to do anything.

lsof | grep find

returns nothing

zfs export backup

lets me watch memory usage fall by about 100MB/s until it's back down to where it started and the zpool unmounts.

I grabbed the stats as above again, I'll post them if they'd have value (don't want to overly clutter).

bill-mcgonigle commented 13 years ago
ls -R

will even do it, though to a lesser extent.

awalland commented 13 years ago

i'm not sure if this is the same thing but here is what i get when i rsync many files to zfs:

Aug 21 20:49:00 anterovipunen kernel: [ 1320.307559] kswapd0 D ffff88007788c2c0 0 21 2 0x00000000 Aug 21 20:49:00 anterovipunen kernel: [ 1320.307567] ffff88007aa49a40 0000000000000046 ffff88007aa49970 ffffffff81049bcd Aug 21 20:49:00 anterovipunen kernel: [ 1320.309635] ffff88007b4314d0 ffff88007aa49fd8 ffff88007aa49fd8 ffff88007aa49fd8 Aug 21 20:49:00 anterovipunen kernel: [ 1320.311666] ffff88007395b090 ffff88007b4314d0 ffff88007f492540 0000000000000001 Aug 21 20:49:00 anterovipunen kernel: [ 1320.313632] Call Trace: Aug 21 20:49:00 anterovipunen kernel: [ 1320.315639] [] ? ttwu_do_wakeup+0x3d/0x110 Aug 21 20:49:00 anterovipunen kernel: [ 1320.317629] [] ? ttwu_do_activate.constprop.166+0x66/0x70 Aug 21 20:49:00 anterovipunen kernel: [ 1320.319657] [] ? try_to_wake_up+0xce/0x280 Aug 21 20:49:00 anterovipunen kernel: [ 1320.321656] [] ? default_wake_function+0x12/0x20 Aug 21 20:49:00 anterovipunen kernel: [ 1320.323668] [] ? prepare_to_wait_exclusive+0x60/0x90 Aug 21 20:49:00 anterovipunen kernel: [ 1320.325679] [] cv_wait_common+0x77/0xd0 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.327773] [] ? abort_exclusive_wait+0xb0/0xb0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.329781] [] cv_wait+0x13/0x20 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.331784] [] txg_wait_open+0x73/0xa0 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.333698] [] dmu_tx_wait+0xed/0xf0 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.335572] [] dmu_tx_assign+0x66/0x420 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.337404] [] zfs_inactive+0xcf/0x1a0 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.339151] [] ? truncate_pagecache+0x59/0x70 Aug 21 20:49:00 anterovipunen kernel: [ 1320.340879] [] zpl_evict_inode+0x28/0x30 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.342550] [] evict+0x85/0x150 Aug 21 20:49:00 anterovipunen kernel: [ 1320.344202] [] dispose_list+0x3e/0x50 Aug 21 20:49:00 anterovipunen kernel: [ 1320.345911] [] shrink_icache_memory+0x17a/0x320 Aug 21 20:49:00 anterovipunen kernel: [ 1320.347573] [] shrink_slab+0x11d/0x190 Aug 21 20:49:00 anterovipunen kernel: [ 1320.349210] [] balance_pgdat+0x51a/0x6e0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.350951] [] kswapd+0x10a/0x3e0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.352601] [] ? abort_exclusive_wait+0xb0/0xb0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.354264] [] ? balance_pgdat+0x6e0/0x6e0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.355909] [] kthread+0x8c/0xa0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.357557] [] kernel_thread_helper+0x4/0x10 Aug 21 20:49:00 anterovipunen kernel: [ 1320.359175] [] ? kthread_worker_fn+0x190/0x190 Aug 21 20:49:00 anterovipunen kernel: [ 1320.360727] [] ? gs_change+0x13/0x13 Aug 21 20:49:00 anterovipunen kernel: [ 1320.365422] txg_sync D ffff88007788c440 0 1294 2 0x00000000 Aug 21 20:49:00 anterovipunen kernel: [ 1320.365428] ffff880071629c00 0000000000000046 ffff880071629b00 ffffffff8104c396 Aug 21 20:49:00 anterovipunen kernel: [ 1320.367054] ffff88007395b780 ffff880071629fd8 ffff880071629fd8 ffff880071629fd8 Aug 21 20:49:00 anterovipunen kernel: [ 1320.368679] ffff8800743f4c50 ffff88007395b780 0000000000000010 ffff8800765e4800 Aug 21 20:49:00 anterovipunen kernel: [ 1320.370320] Call Trace: Aug 21 20:49:00 anterovipunen kernel: [ 1320.371934] [] ? ttwu_do_activate.constprop.166+0x66/0x70 Aug 21 20:49:00 anterovipunen kernel: [ 1320.373630] [] ? default_wake_function+0x12/0x20 Aug 21 20:49:00 anterovipunen kernel: [ 1320.375292] [] ? __wake_up_common+0x58/0x90 Aug 21 20:49:00 anterovipunen kernel: [ 1320.376960] [] ? prepare_to_wait_exclusive+0x60/0x90 Aug 21 20:49:00 anterovipunen kernel: [ 1320.378627] [] cv_wait_common+0x77/0xd0 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.380316] [] ? abort_exclusive_wait+0xb0/0xb0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.382002] [] cv_wait+0x13/0x20 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.383720] [] zio_wait+0xfb/0x170 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.385436] [] dsl_pool_sync+0xca/0x450 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.387182] [] spa_sync+0x38e/0xa00 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.388898] [] ? default_wake_function+0x12/0x20 Aug 21 20:49:00 anterovipunen kernel: [ 1320.390644] [] ? autoremove_wake_function+0x16/0x40 Aug 21 20:49:00 anterovipunen kernel: [ 1320.392390] [] ? wake_up+0x53/0x70 Aug 21 20:49:00 anterovipunen kernel: [ 1320.394162] [] txg_sync_thread+0x216/0x370 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.395941] [] ? txg_quiesce_thread+0x240/0x240 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.397786] [] ? txg_quiesce_thread+0x240/0x240 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.399576] [] thread_generic_wrapper+0x78/0x90 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.401380] [] ? __thread_create+0x160/0x160 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.403186] [] kthread+0x8c/0xa0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.404978] [] kernel_thread_helper+0x4/0x10 Aug 21 20:49:00 anterovipunen kernel: [ 1320.406761] [] ? kthread_worker_fn+0x190/0x190 Aug 21 20:49:00 anterovipunen kernel: [ 1320.408517] [] ? gs_change+0x13/0x13 Aug 21 20:49:00 anterovipunen kernel: [ 1320.413568] rsync D 0000000100041566 0 1508 1507 0x00000000 Aug 21 20:49:00 anterovipunen kernel: [ 1320.413574] ffff8800474fbb28 0000000000000082 0000000100041566 ffffffff81011f33 Aug 21 20:49:00 anterovipunen kernel: [ 1320.415194] ffff8800765d3e70 ffff8800474fbfd8 ffff8800474fbfd8 ffff8800474fbfd8 Aug 21 20:49:00 anterovipunen kernel: [ 1320.416773] ffff8800765d14d0 ffff8800765d3e70 ffff88007f492f40 0000000000000086 Aug 21 20:49:00 anterovipunen kernel: [ 1320.418346] Call Trace: Aug 21 20:49:00 anterovipunen kernel: [ 1320.419858] [] ? native_sched_clock+0x13/0x60 Aug 21 20:49:00 anterovipunen kernel: [ 1320.421410] [] ? try_to_wake_up+0xce/0x280 Aug 21 20:49:00 anterovipunen kernel: [ 1320.422947] [] ? default_wake_function+0x12/0x20 Aug 21 20:49:00 anterovipunen kernel: [ 1320.424479] [] cv_wait_common+0x77/0xd0 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.425990] [] ? abort_exclusive_wait+0xb0/0xb0 Aug 21 20:49:00 anterovipunen kernel: [ 1320.427520] [] cv_wait+0x13/0x20 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.429043] [] txg_wait_open+0x73/0xa0 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.430584] [] dmu_tx_wait+0xed/0xf0 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.432098] [] zfs_setattr+0x15ac/0x1810 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.433602] [] ? do_lookup+0x50/0x310 Aug 21 20:49:00 anterovipunen kernel: [ 1320.435090] [] ? do_lookup+0x50/0x310 Aug 21 20:49:00 anterovipunen kernel: [ 1320.436572] [] ? __kmalloc+0x11c/0x160 Aug 21 20:49:00 anterovipunen kernel: [ 1320.438045] [] ? kmem_alloc_debug+0xbb/0x120 [spl] Aug 21 20:49:00 anterovipunen kernel: [ 1320.439553] [] zpl_setattr+0xdc/0x110 [zfs] Aug 21 20:49:00 anterovipunen kernel: [ 1320.441060] [] ? kmem_cache_free+0x2f/0x120 Aug 21 20:49:00 anterovipunen kernel: [ 1320.442561] [] notify_change+0x1b1/0x390 Aug 21 20:49:00 anterovipunen kernel: [ 1320.444067] [] utimes_common+0xc1/0x150 Aug 21 20:49:00 anterovipunen kernel: [ 1320.445556] [] do_utimes+0xec/0x100 Aug 21 20:49:00 anterovipunen kernel: [ 1320.447039] [] sys_utimensat+0x58/0x80 Aug 21 20:49:00 anterovipunen kernel: [ 1320.448505] [] system_call_fastpath+0x16/0x1b

it works fine for a while and then just dies. i was using the latest repo code on arch linux, kernel 3.0.3 with the recommended options from the FAQ.

ghost commented 13 years ago

Hi - Not sure what's going on here as I don't know much regarding the programming/debugging of zfs but I seems to experience that issue, that is with or without rsync. I never had to read much files on my system as I use zfs for backups storage, however I just had to restore things from the zfs pool and it crashed after a while.. rebooted.. crashed after a while..

Here is the error I found in dmesg/syslog : http://pastebin.com/jMTCNEFy

If there is anything I can do to help, as far as testings, don't hesitate to let me know.

Thanks,

edit: using git from 2011-08-22 on debian squeeze

bill-mcgonigle commented 13 years ago

I think I started seeing this when I updated to rc5. I was running rc4 with the ashift patch previously, but I also updated from 2.6.32 to 2.6.40 kernel, so both things changed. I also changed the disks from 1.5TB to 2TB.

The reporter's messages show v0.6.0.16, but mine shows only v0.6.0.

As a workaround, I'm making a script that does each step in the backup and then doing a zpool export/import between each step.

ahamilto commented 13 years ago

I found another workaround to this issue. Running the following on the system while running long rsyncs keeps the memory usage down to the specified arc sizes: while true; do echo 1 >/proc/sys/vm/drop_caches; sleep 60; done;

I'm not sure why this works while just running a single echo 1 >/proc/sys/vm/drop_caches doesn't free larger amounts of RAM when they are in use.

ghost commented 13 years ago

Alphalead : I think your trick allowed my rsync session to last longer but after a while it crashed again unfortunately

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981661] Oops: 0002 [#1] SMP

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981673] last sysfs file: /sys/module/mbcache/initstate

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982056] Stack:

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982133] Call Trace:

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982206] Code: 49 8b 14 04 48 c1 e2 03 e8 83 88 ff ff 85 c0 75 10 48 8d 54 24 70 48 89 de 44 89 ef e8 5b f3 ff ff 48 8b 54 24 50 be d0 00 00 00 <48> c7 02 00 00 00 00 48 8b 54 24 48 48 8b 7c 24 70 e8 7d f6 ff

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982346] CR2: 0000000000000000

edit: version used is latest commit (2708f716c0)

mgiammarco commented 13 years ago

I am not disappearead. I try all version from ppa repository. With the last version there are several improvements. Unfortunately I get this: [ 2125.140009] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.140023] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.140025] Call Trace: [ 2125.140149] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.140256] [] ? default_spin_lock_flags+0x9/0x10 [ 2125.140261] [] ? alloc_pages_current+0xa5/0x110 [ 2125.140314] [] ? new_slab+0x282/0x290 [ 2125.140317] [] ? slab_alloc+0x262/0x390 [ 2125.140431] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.140435] [] ? kmem_cache_alloc+0xff/0x110 [ 2125.140437] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.140439] [] ? scsi_host_alloc_command.clone.7+0x33/0xa0 [ 2125.140442] [] ? scsi_get_command+0x2e/0xc0 [ 2125.140444] [] ? scsi_get_command+0x48/0xc0 [ 2125.140449] [] ? scsi_setup_fs_cmnd+0x8d/0xe0 [ 2125.140457] [] ? sd_prep_fn+0xaf/0xba0 [ 2125.140465] [] ? __wake_up+0x53/0x70 [ 2125.140534] [] ? blk_peek_request+0xce/0x220 [ 2125.140537] [] ? scsi_request_fn+0x66/0x4d0 [ 2125.140540] [] ? blk_unplug_work+0x0/0x70 [ 2125.140543] [] ? generic_unplug_device+0x37/0x40 [ 2125.140546] [] ? generic_unplug_device+0x30/0x50 [ 2125.140554] [] ? blk_unplug_work+0x32/0x70 [ 2125.140572] [] ? process_one_work+0x11d/0x420 [ 2125.140576] [] ? worker_thread+0x169/0x360 [ 2125.140578] [] ? worker_thread+0x0/0x360 [ 2125.140581] [] ? kthread+0x96/0xa0 [ 2125.140587] [] ? kernel_thread_helper+0x4/0x10 [ 2125.140589] [] ? kthread+0x0/0xa0 [ 2125.140591] [] ? kernel_thread_helper+0x0/0x10 [ 2125.140593] Mem-Info: [ 2125.140595] Node 0 DMA per-cpu: [ 2125.140658] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.140660] Node 0 DMA32 per-cpu: [ 2125.140662] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.140663] Node 0 Normal per-cpu: [ 2125.140665] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.140668] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.140669] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.140670] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.140671] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.140671] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.140673] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140685] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.140688] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140696] lowmem_reserve[]: 0 0 2020 2020 [ 2125.140698] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140706] lowmem_reserve[]: 0 0 0 0 [ 2125.140708] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.140716] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.140723] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.140733] 5578 total pagecache pages [ 2125.140734] 0 pages in swap cache [ 2125.140736] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.140737] Free swap = 0kB [ 2125.140738] Total swap = 0kB [ 2125.191753] 1310704 pages RAM [ 2125.191756] 37445 pages reserved [ 2125.191757] 17402 pages shared [ 2125.191758] 1181257 pages non-shared [ 2125.191762] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.191764] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.191767] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.191885] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.191888] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.191890] Call Trace: [ 2125.191892] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.192098] [] ? spin_unlock+0xe/0x20 [ 2125.192148] [] ? tcp_v4_rcv+0x659/0x900 [ 2125.192158] [] ? alloc_pages_current+0xa5/0x110 [ 2125.192161] [] ? new_slab+0x282/0x290 [ 2125.192219] [] ? unmap_single+0x34/0x70 [ 2125.192222] [] ? __slab_alloc+0x262/0x390 [ 2125.192368] [] ? alloc_skb+0x4f/0x170 [ 2125.192371] [] ? kmem_cache_alloc_node+0x69/0x120 [ 2125.192376] [] ? enqueue_task_fair+0x61/0xa0 [ 2125.192378] [] ? alloc_skb+0x4f/0x170 [ 2125.192380] [] ? netdev_alloc_skb+0x24/0x50 [ 2125.192524] [] ? e1000_clean_rx_irq+0x2ad/0x4b0 [e1000] [ 2125.192529] [] ? e1000_clean+0x51/0xc0 [e1000] [ 2125.192532] [] ? net_rx_action+0x128/0x270 [ 2125.192540] [] ? rcu_process_callbacks+0x99/0x180 [ 2125.192543] [] ? __do_softirq+0xa8/0x1c0 [ 2125.192551] [] ? tick_program_event+0x1f/0x30 [ 2125.192555] [] ? call_softirq+0x1c/0x30 [ 2125.192557] [] ? do_softirq+0x65/0xa0 [ 2125.192559] [] ? irq_exit+0x85/0x90 [ 2125.192567] [] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.192570] [] ? apic_timer_interrupt+0x13/0x20 [ 2125.192572] [] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.192579] [] ? blk_unplug_work+0x0/0x70 [ 2125.192582] [] ? generic_unplug_device+0x37/0x40 [ 2125.192585] [] ? generic_unplug_device+0x30/0x50 [ 2125.192587] [] ? blk_unplug_work+0x32/0x70 [ 2125.192591] [] ? process_one_work+0x11d/0x420 [ 2125.192594] [] ? worker_thread+0x169/0x360 [ 2125.192597] [] ? worker_thread+0x0/0x360 [ 2125.192599] [] ? kthread+0x96/0xa0 [ 2125.192602] [] ? kernel_thread_helper+0x4/0x10 [ 2125.192604] [] ? kthread+0x0/0xa0 [ 2125.192607] [] ? kernel_thread_helper+0x0/0x10 [ 2125.192608] Mem-Info: [ 2125.192610] Node 0 DMA per-cpu: [ 2125.192612] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.192613] Node 0 DMA32 per-cpu: [ 2125.192615] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.192616] Node 0 Normal per-cpu: [ 2125.192618] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.192622] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.192622] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.192623] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.192624] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.192625] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.192626] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192634] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.192637] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192644] lowmem_reserve[]: 0 0 2020 2020 [ 2125.192647] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192659] lowmem_reserve[]: 0 0 0 0 [ 2125.192661] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.192669] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.192675] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.192682] 5578 total pagecache pages [ 2125.192683] 0 pages in swap cache [ 2125.192685] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.192686] Free swap = 0kB [ 2125.192687] Total swap = 0kB [ 2125.212609] 1310704 pages RAM [ 2125.212612] 37445 pages reserved [ 2125.212613] 17402 pages shared [ 2125.212614] 1181257 pages non-shared [ 2125.212618] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.212620] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.212623] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.212684] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.212688] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.212690] Call Trace: [ 2125.212692] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.212705] [] ? try_to_wake_up+0x244/0x3e0 [ 2125.212709] [] ? alloc_pages_current+0xa5/0x110 [ 2125.212712] [] ? new_slab+0x282/0x290 [ 2125.212743] [] ? lapic_next_event+0x1d/0x30 [ 2125.212748] [] ? clockevents_program_event+0x67/0xa0 [ 2125.212750] [] ? slab_alloc+0x262/0x390 [ 2125.212754] [] ? skb_clone+0x154/0x170 [ 2125.212757] [] ? tick_program_event+0x1f/0x30 [ 2125.212759] [] ? kmem_cache_alloc+0xff/0x110 [ 2125.212762] [] ? skb_clone+0x154/0x170 [ 2125.212936] [] ? packet_rcv_spkt+0x7d/0x1a0 [ 2125.212941] [] ? netif_receive_skb+0x2e4/0x580 [ 2125.212944] [] ? netif_receive_skb+0x58/0x80 [ 2125.212947] [] ? napi_skb_finish+0x50/0x70 [ 2125.212949] [] ? napi_gro_receive+0xb5/0xc0 [ 2125.212963] [] ? e1000_receive_skb+0x5b/0x90 [e1000] [ 2125.212967] [] ? e1000_clean_rx_irq+0x25e/0x4b0 [e1000] [ 2125.212971] [] ? e1000_clean+0x51/0xc0 [e1000] [ 2125.212974] [] ? net_rx_action+0x128/0x270 [ 2125.212977] [] ? rcu_process_callbacks+0x99/0x180 [ 2125.212980] [] ? do_softirq+0xa8/0x1c0 [ 2125.212984] [] ? tick_program_event+0x1f/0x30 [ 2125.212987] [] ? call_softirq+0x1c/0x30 [ 2125.212989] [] ? do_softirq+0x65/0xa0 [ 2125.212991] [] ? irq_exit+0x85/0x90 [ 2125.212994] [] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.212997] [] ? apic_timer_interrupt+0x13/0x20 [ 2125.212998] [] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.213007] [] ? blk_unplug_work+0x0/0x70 [ 2125.213010] [] ? generic_unplug_device+0x37/0x40 [ 2125.213012] [] ? generic_unplug_device+0x30/0x50 [ 2125.213015] [] ? blk_unplug_work+0x32/0x70 [ 2125.213018] [] ? process_one_work+0x11d/0x420 [ 2125.213021] [] ? worker_thread+0x169/0x360 [ 2125.213024] [] ? worker_thread+0x0/0x360 [ 2125.213026] [] ? kthread+0x96/0xa0 [ 2125.213028] [] ? kernel_thread_helper+0x4/0x10 [ 2125.213040] [] ? kthread+0x0/0xa0 [ 2125.213042] [] ? kernel_thread_helper+0x0/0x10 [ 2125.213044] Mem-Info: [ 2125.213045] Node 0 DMA per-cpu: [ 2125.213048] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.213049] Node 0 DMA32 per-cpu: [ 2125.213051] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.213052] Node 0 Normal per-cpu: [ 2125.213054] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.213057] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.213058] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.213059] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.213059] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.213060] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.213062] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213070] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.213073] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213080] lowmem_reserve[]: 0 0 2020 2020 [ 2125.213082] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213090] lowmem_reserve[]: 0 0 0 0 [ 2125.213093] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.213100] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.213107] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.213113] 5578 total pagecache pages [ 2125.213116] 0 pages in swap cache [ 2125.213117] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.213118] Free swap = 0kB [ 2125.213119] Total swap = 0kB [ 2125.234376] 1310704 pages RAM [ 2125.234379] 37445 pages reserved [ 2125.234380] 17402 pages shared [ 2125.234381] 1181257 pages non-shared [ 2125.234385] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.234388] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.234390] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.234579] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.234582] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.234584] Call Trace: [ 2125.234586] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.234596] [] ? alloc_pages_current+0xa5/0x110 [ 2125.234599] [] ? alloc_pages_current+0xa5/0x110 [ 2125.234602] [] ? new_slab+0x282/0x290 [ 2125.234605] [] ? slab_alloc+0x262/0x390 [ 2125.234608] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.234665] [] ? kmem_cache_alloc+0xff/0x110 [ 2125.234669] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.234671] [] ? scsi_host_alloc_command.clone.7+0x33/0xa0 [ 2125.234692] [] ? scsi_get_command+0x2e/0xc0 [ 2125.234695] [] ? scsi_get_command+0x48/0xc0 [ 2125.234698] [] ? scsi_setup_fs_cmnd+0x8d/0xe0 [ 2125.234701] [] ? sd_prep_fn+0xaf/0xba0 [ 2125.234709] [] ? mptspi_qcmd+0x97/0x100 [mptspi] [ 2125.234714] [] ? blk_peek_request+0xce/0x220 [ 2125.234763] [] ? scsi_request_fn+0x66/0x4d0 [ 2125.234766] [] ? blk_run_queue+0x8e/0x170 [ 2125.234768] [] ? blk_run_queue+0x32/0x50 [ 2125.234770] [] ? scsi_run_queue+0xc8/0x2a0 [ 2125.234772] [] ? scsi_put_command+0x67/0xa0 [ 2125.234775] [] ? scsi_next_command+0x42/0x60 [ 2125.234777] [] ? scsi_io_completion+0x252/0x590 [ 2125.234780] [] ? scsi_finish_command+0xc5/0x130 [ 2125.234782] [] ? scsi_softirq_done+0x147/0x170 [ 2125.234785] [] ? blk_done_softirq+0x85/0xa0 [ 2125.234788] [] ? do_softirq+0xa8/0x1c0 [ 2125.234792] [] ? tick_program_event+0x1f/0x30 [ 2125.234795] [] ? call_softirq+0x1c/0x30 [ 2125.234797] [] ? do_softirq+0x65/0xa0 [ 2125.234799] [] ? irq_exit+0x85/0x90 [ 2125.234803] [] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.234805] [] ? apic_timer_interrupt+0x13/0x20 [ 2125.234807] [] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.234811] [] ? blk_unplug_work+0x0/0x70 [ 2125.234814] [] ? __generic_unplug_device+0x37/0x40 [ 2125.234817] [] ? generic_unplug_device+0x30/0x50 [ 2125.234819] [] ? blk_unplug_work+0x32/0x70 [ 2125.234822] [] ? process_one_work+0x11d/0x420 [ 2125.234825] [] ? worker_thread+0x169/0x360 [ 2125.234828] [] ? worker_thread+0x0/0x360 [ 2125.234830] [] ? kthread+0x96/0xa0 [ 2125.234833] [] ? kernel_thread_helper+0x4/0x10 [ 2125.234835] [] ? kthread+0x0/0xa0 [ 2125.234837] [] ? kernel_thread_helper+0x0/0x10 [ 2125.234838] Mem-Info: [ 2125.234840] Node 0 DMA per-cpu: [ 2125.234842] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.234843] Node 0 DMA32 per-cpu: [ 2125.234845] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.234847] Node 0 Normal per-cpu: [ 2125.234848] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.234852] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.234853] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.234853] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.234854] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.234855] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.234856] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234864] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.234867] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234874] lowmem_reserve[]: 0 0 2020 2020 [ 2125.234884] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234891] lowmem_reserve[]: 0 0 0 0 [ 2125.234894] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.234901] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.234908] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.234914] 5578 total pagecache pages [ 2125.234916] 0 pages in swap cache [ 2125.234917] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.234918] Free swap = 0kB [ 2125.234919] Total swap = 0kB [ 2125.255600] 1310704 pages RAM [ 2125.255603] 37445 pages reserved [ 2125.255604] 17402 pages shared [ 2125.255605] 1181257 pages non-shared [ 2125.255609] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.255612] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.255614] node 0: slabs: 124, objs: 1984, free: 0 [ 2132.944060] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2132.944284] IP: [] balance_pgdat+0x516/0x6e0 [ 2132.944513] PGD 13307d067 PUD 155da4067 PMD 0 [ 2132.944584] Oops: 0002 [#1] SMP [ 2132.944633] last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/host2/target2:0:2/2:0:2:0/block/sdc/queue/scheduler [ 2132.944781] CPU 0 [ 2132.944809] Modules linked in: zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate ppdev nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc psmouse vmw_balloon serio_raw parport_pc i2c_piix4 shpchp lp parport mptspi floppy e1000 mptscsih mptbase [ 2132.945198] [ 2132.945220] Pid: 23, comm: kswapd0 Tainted: P 2.6.38-11-server

50-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference

Platform [ 2132.945544] RIP: 0010:[] [] balance_pgdat+0x516/0x6e0 [ 2132.945664] RSP: 0018:ffff880174707d90 EFLAGS: 00010282 [ 2132.945737] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000bf6 [ 2132.945837] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000ae0 [ 2132.945935] RBP: ffff880174707e80 R08: 0000000000000000 R09: 0000000000000100 [ 2132.946190] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 [ 2132.946383] R13: ffff88017fffb000 R14: ffff88017fffb000 R15: 0000000000000002 [ 2132.946606] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 [ 2132.946894] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2132.947061] CR2: 0000000000000000 CR3: 0000000155da5000 CR4: 00000000000006f0 [ 2132.947284] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2132.947490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2132.947677] Process kswapd0 (pid: 23, threadinfo ffff880174706000, task ffff88017470adc0) [ 2132.947964] Stack: [ 2132.948079] ffff880174707da0 ffff88017470adc0 ffff880174707dc0 ffff880174707e34 [ 2132.948474] ffff880174707eac 0000000000000000 0000000000001454 0000000000000000 [ 2132.948759] 0000000100000000 ffff880174707e40 0000000000000000 000000007fffb000 [ 2132.949048] Call Trace: [ 2132.949173] [] kswapd+0xea/0x1d0 [ 2132.949329] [] ? kswapd+0x0/0x1d0 [ 2132.949486] [] kthread+0x96/0xa0 [ 2132.949642] [] kernel_thread_helper+0x4/0x10 [ 2132.949812] [] ? kthread+0x0/0xa0 [ 2132.949969] [] ? kernel_thread_helper+0x0/0x10 [ 2132.950157] Code: ea ff ff e9 82 fc ff ff 0f 1f 44 00 00 8b bd 74 ff ff ff 48 8d 55 80 4c 89 f6 e8 06 ee ff ff 48 8b 95 38 ff ff ff be d0 00 00 00

<48> c7 02 00 00 00 00 48 8b 95 40 ff ff ff 48 8b 7d 80 e8 a3 ce [ 2132.950800] RIP [] balance_pgdat+0x516/0x6e0 [ 2132.950972] RSP [ 2132.951107] CR2: 0000000000000000 [ 2132.956029] ---[ end trace 674593f315e64846 ]--- With swapon it hangs the system. With swapoff it seems it continues to work.
mgiammarco commented 13 years ago

---------- Forwarded message ---------- From: briceb83 < reply@reply.github.com> Date: 2011/9/11 Subject: Re: [zfs] Hang during rsync (#261) To: mgiammarco mgiammarco@gmail.com

Alphalead : I think your trick allowed my rsync session to last longer but after a while it crashed again unfortunately

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981661] Oops: 0002 [#1] SMP

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.981673] last sysfs file: /sys/module/mbcache/initstate

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982056] Stack:

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982133] Call Trace:

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982206] Code: 49 8b 14 04 48 c1 e2 03 e8 83 88 ff ff 85 c0 75 10 48 8d 54 24 70 48 89 de 44 89 ef e8 5b f3 ff ff 48 8b 54 24 50 be d0 00 00 00 <48> c7 02 00 00 00 00 48 8b 54 24 48 48 8b 7c 24 70 e8 7d f6 ff

Message from syslogd@stor01 at Sep 11 12:18:11 ... kernel:[1714241.982346] CR2: 0000000000000000

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-2064709

mgiammarco commented 13 years ago

I add that after several hours the system definitively hangs with this (partial) message:

SLUB: Unable to allocate memory on node -1 (gfp=0x20)

2011/10/1 Mario Giammarco mgiammarco@gmail.com

I am not disappearead. I try all version from ppa repository. With the last version there are several improvements. Unfortunately I get this: [ 2125.140009] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.140023] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.140025] Call Trace: [ 2125.140149] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.140256] [] ? default_spin_lock_flags+0x9/0x10 [ 2125.140261] [] ? alloc_pages_current+0xa5/0x110 [ 2125.140314] [] ? new_slab+0x282/0x290 [ 2125.140317] [] ? slab_alloc+0x262/0x390 [ 2125.140431] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.140435] [] ? kmem_cache_alloc+0xff/0x110 [ 2125.140437] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.140439] [] ? scsi_host_alloc_command.clone.7+0x33/0xa0 [ 2125.140442] [] ? scsi_get_command+0x2e/0xc0 [ 2125.140444] [] ? scsi_get_command+0x48/0xc0 [ 2125.140449] [] ? scsi_setup_fs_cmnd+0x8d/0xe0 [ 2125.140457] [] ? sd_prep_fn+0xaf/0xba0 [ 2125.140465] [] ? __wake_up+0x53/0x70 [ 2125.140534] [] ? blk_peek_request+0xce/0x220 [ 2125.140537] [] ? scsi_request_fn+0x66/0x4d0 [ 2125.140540] [] ? blk_unplug_work+0x0/0x70 [ 2125.140543] [] ? generic_unplug_device+0x37/0x40 [ 2125.140546] [] ? generic_unplug_device+0x30/0x50 [ 2125.140554] [] ? blk_unplug_work+0x32/0x70 [ 2125.140572] [] ? process_one_work+0x11d/0x420 [ 2125.140576] [] ? worker_thread+0x169/0x360 [ 2125.140578] [] ? worker_thread+0x0/0x360 [ 2125.140581] [] ? kthread+0x96/0xa0 [ 2125.140587] [] ? kernel_thread_helper+0x4/0x10 [ 2125.140589] [] ? kthread+0x0/0xa0 [ 2125.140591] [] ? kernel_thread_helper+0x0/0x10 [ 2125.140593] Mem-Info: [ 2125.140595] Node 0 DMA per-cpu: [ 2125.140658] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.140660] Node 0 DMA32 per-cpu: [ 2125.140662] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.140663] Node 0 Normal per-cpu: [ 2125.140665] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.140668] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.140669] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.140670] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.140671] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.140671] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.140673] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140685] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.140688] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140696] lowmem_reserve[]: 0 0 2020 2020 [ 2125.140698] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.140706] lowmem_reserve[]: 0 0 0 0 [ 2125.140708] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.140716] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.140723] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.140733] 5578 total pagecache pages [ 2125.140734] 0 pages in swap cache [ 2125.140736] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.140737] Free swap = 0kB [ 2125.140738] Total swap = 0kB [ 2125.191753] 1310704 pages RAM [ 2125.191756] 37445 pages reserved [ 2125.191757] 17402 pages shared [ 2125.191758] 1181257 pages non-shared [ 2125.191762] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.191764] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.191767] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.191885] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.191888] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.191890] Call Trace: [ 2125.191892] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.192098] [] ? spin_unlock+0xe/0x20 [ 2125.192148] [] ? tcp_v4_rcv+0x659/0x900 [ 2125.192158] [] ? alloc_pages_current+0xa5/0x110 [ 2125.192161] [] ? new_slab+0x282/0x290 [ 2125.192219] [] ? unmap_single+0x34/0x70 [ 2125.192222] [] ? __slab_alloc+0x262/0x390 [ 2125.192368] [] ? alloc_skb+0x4f/0x170 [ 2125.192371] [] ? kmem_cache_alloc_node+0x69/0x120 [ 2125.192376] [] ? enqueue_task_fair+0x61/0xa0 [ 2125.192378] [] ? alloc_skb+0x4f/0x170 [ 2125.192380] [] ? netdev_alloc_skb+0x24/0x50 [ 2125.192524] [] ? e1000_clean_rx_irq+0x2ad/0x4b0 [e1000] [ 2125.192529] [] ? e1000_clean+0x51/0xc0 [e1000] [ 2125.192532] [] ? net_rx_action+0x128/0x270 [ 2125.192540] [] ? rcu_process_callbacks+0x99/0x180 [ 2125.192543] [] ? __do_softirq+0xa8/0x1c0 [ 2125.192551] [] ? tick_program_event+0x1f/0x30 [ 2125.192555] [] ? call_softirq+0x1c/0x30 [ 2125.192557] [] ? do_softirq+0x65/0xa0 [ 2125.192559] [] ? irq_exit+0x85/0x90 [ 2125.192567] [] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.192570] [] ? apic_timer_interrupt+0x13/0x20 [ 2125.192572] [] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.192579] [] ? blk_unplug_work+0x0/0x70 [ 2125.192582] [] ? generic_unplug_device+0x37/0x40 [ 2125.192585] [] ? generic_unplug_device+0x30/0x50 [ 2125.192587] [] ? blk_unplug_work+0x32/0x70 [ 2125.192591] [] ? process_one_work+0x11d/0x420 [ 2125.192594] [] ? worker_thread+0x169/0x360 [ 2125.192597] [] ? worker_thread+0x0/0x360 [ 2125.192599] [] ? kthread+0x96/0xa0 [ 2125.192602] [] ? kernel_thread_helper+0x4/0x10 [ 2125.192604] [] ? kthread+0x0/0xa0 [ 2125.192607] [] ? kernel_thread_helper+0x0/0x10 [ 2125.192608] Mem-Info: [ 2125.192610] Node 0 DMA per-cpu: [ 2125.192612] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.192613] Node 0 DMA32 per-cpu: [ 2125.192615] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.192616] Node 0 Normal per-cpu: [ 2125.192618] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.192622] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.192622] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.192623] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.192624] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.192625] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.192626] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192634] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.192637] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192644] lowmem_reserve[]: 0 0 2020 2020 [ 2125.192647] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.192659] lowmem_reserve[]: 0 0 0 0 [ 2125.192661] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.192669] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.192675] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.192682] 5578 total pagecache pages [ 2125.192683] 0 pages in swap cache [ 2125.192685] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.192686] Free swap = 0kB [ 2125.192687] Total swap = 0kB [ 2125.212609] 1310704 pages RAM [ 2125.212612] 37445 pages reserved [ 2125.212613] 17402 pages shared [ 2125.212614] 1181257 pages non-shared [ 2125.212618] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.212620] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.212623] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.212684] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.212688] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.212690] Call Trace: [ 2125.212692] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.212705] [] ? try_to_wake_up+0x244/0x3e0 [ 2125.212709] [] ? alloc_pages_current+0xa5/0x110 [ 2125.212712] [] ? new_slab+0x282/0x290 [ 2125.212743] [] ? lapic_next_event+0x1d/0x30 [ 2125.212748] [] ? clockevents_program_event+0x67/0xa0 [ 2125.212750] [] ? slab_alloc+0x262/0x390 [ 2125.212754] [] ? skb_clone+0x154/0x170 [ 2125.212757] [] ? tick_program_event+0x1f/0x30 [ 2125.212759] [] ? kmem_cache_alloc+0xff/0x110 [ 2125.212762] [] ? skb_clone+0x154/0x170 [ 2125.212936] [] ? packet_rcv_spkt+0x7d/0x1a0 [ 2125.212941] [] ? netif_receive_skb+0x2e4/0x580 [ 2125.212944] [] ? netif_receive_skb+0x58/0x80 [ 2125.212947] [] ? napi_skb_finish+0x50/0x70 [ 2125.212949] [] ? napi_gro_receive+0xb5/0xc0 [ 2125.212963] [] ? e1000_receive_skb+0x5b/0x90 [e1000] [ 2125.212967] [] ? e1000_clean_rx_irq+0x25e/0x4b0 [e1000] [ 2125.212971] [] ? e1000_clean+0x51/0xc0 [e1000] [ 2125.212974] [] ? net_rx_action+0x128/0x270 [ 2125.212977] [] ? rcu_process_callbacks+0x99/0x180 [ 2125.212980] [] ? do_softirq+0xa8/0x1c0 [ 2125.212984] [] ? tick_program_event+0x1f/0x30 [ 2125.212987] [] ? call_softirq+0x1c/0x30 [ 2125.212989] [] ? do_softirq+0x65/0xa0 [ 2125.212991] [] ? irq_exit+0x85/0x90 [ 2125.212994] [] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.212997] [] ? apic_timer_interrupt+0x13/0x20 [ 2125.212998] [] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.213007] [] ? blk_unplug_work+0x0/0x70 [ 2125.213010] [] ? generic_unplug_device+0x37/0x40 [ 2125.213012] [] ? generic_unplug_device+0x30/0x50 [ 2125.213015] [] ? blk_unplug_work+0x32/0x70 [ 2125.213018] [] ? process_one_work+0x11d/0x420 [ 2125.213021] [] ? worker_thread+0x169/0x360 [ 2125.213024] [] ? worker_thread+0x0/0x360 [ 2125.213026] [] ? kthread+0x96/0xa0 [ 2125.213028] [] ? kernel_thread_helper+0x4/0x10 [ 2125.213040] [] ? kthread+0x0/0xa0 [ 2125.213042] [] ? kernel_thread_helper+0x0/0x10 [ 2125.213044] Mem-Info: [ 2125.213045] Node 0 DMA per-cpu: [ 2125.213048] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.213049] Node 0 DMA32 per-cpu: [ 2125.213051] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.213052] Node 0 Normal per-cpu: [ 2125.213054] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.213057] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.213058] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.213059] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.213059] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.213060] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.213062] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213070] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.213073] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213080] lowmem_reserve[]: 0 0 2020 2020 [ 2125.213082] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.213090] lowmem_reserve[]: 0 0 0 0 [ 2125.213093] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.213100] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.213107] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.213113] 5578 total pagecache pages [ 2125.213116] 0 pages in swap cache [ 2125.213117] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.213118] Free swap = 0kB [ 2125.213119] Total swap = 0kB [ 2125.234376] 1310704 pages RAM [ 2125.234379] 37445 pages reserved [ 2125.234380] 17402 pages shared [ 2125.234381] 1181257 pages non-shared [ 2125.234385] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.234388] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.234390] node 0: slabs: 124, objs: 1984, free: 0 [ 2125.234579] kworker/0:1: page allocation failure. order:0, mode:0x20 [ 2125.234582] Pid: 22, comm: kworker/0:1 Tainted: P 2.6.38-11-server #50-Ubuntu [ 2125.234584] Call Trace: [ 2125.234586] [] ? alloc_pages_nodemask+0x5f4/0x830 [ 2125.234596] [] ? alloc_pages_current+0xa5/0x110 [ 2125.234599] [] ? alloc_pages_current+0xa5/0x110 [ 2125.234602] [] ? new_slab+0x282/0x290 [ 2125.234605] [] ? slab_alloc+0x262/0x390 [ 2125.234608] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.234665] [] ? kmem_cache_alloc+0xff/0x110 [ 2125.234669] [] ? scsi_pool_alloc_command+0x33/0x80 [ 2125.234671] [] ? scsi_host_alloc_command.clone.7+0x33/0xa0 [ 2125.234692] [] ? scsi_get_command+0x2e/0xc0 [ 2125.234695] [] ? scsi_get_command+0x48/0xc0 [ 2125.234698] [] ? scsi_setup_fs_cmnd+0x8d/0xe0 [ 2125.234701] [] ? sd_prep_fn+0xaf/0xba0 [ 2125.234709] [] ? mptspi_qcmd+0x97/0x100 [mptspi] [ 2125.234714] [] ? blk_peek_request+0xce/0x220 [ 2125.234763] [] ? scsi_request_fn+0x66/0x4d0 [ 2125.234766] [] ? blk_run_queue+0x8e/0x170 [ 2125.234768] [] ? blk_run_queue+0x32/0x50 [ 2125.234770] [] ? scsi_run_queue+0xc8/0x2a0 [ 2125.234772] [] ? scsi_put_command+0x67/0xa0 [ 2125.234775] [] ? scsi_next_command+0x42/0x60 [ 2125.234777] [] ? scsi_io_completion+0x252/0x590 [ 2125.234780] [] ? scsi_finish_command+0xc5/0x130 [ 2125.234782] [] ? scsi_softirq_done+0x147/0x170 [ 2125.234785] [] ? blk_done_softirq+0x85/0xa0 [ 2125.234788] [] ? do_softirq+0xa8/0x1c0 [ 2125.234792] [] ? tick_program_event+0x1f/0x30 [ 2125.234795] [] ? call_softirq+0x1c/0x30 [ 2125.234797] [] ? do_softirq+0x65/0xa0 [ 2125.234799] [] ? irq_exit+0x85/0x90 [ 2125.234803] [] ? smp_apic_timer_interrupt+0x70/0x9b [ 2125.234805] [] ? apic_timer_interrupt+0x13/0x20 [ 2125.234807] [] ? scsi_request_fn+0x3b3/0x4d0 [ 2125.234811] [] ? blk_unplug_work+0x0/0x70 [ 2125.234814] [] ? __generic_unplug_device+0x37/0x40 [ 2125.234817] [] ? generic_unplug_device+0x30/0x50 [ 2125.234819] [] ? blk_unplug_work+0x32/0x70 [ 2125.234822] [] ? process_one_work+0x11d/0x420 [ 2125.234825] [] ? worker_thread+0x169/0x360 [ 2125.234828] [] ? worker_thread+0x0/0x360 [ 2125.234830] [] ? kthread+0x96/0xa0 [ 2125.234833] [] ? kernel_thread_helper+0x4/0x10 [ 2125.234835] [] ? kthread+0x0/0xa0 [ 2125.234837] [] ? kernel_thread_helper+0x0/0x10 [ 2125.234838] Mem-Info: [ 2125.234840] Node 0 DMA per-cpu: [ 2125.234842] CPU 0: hi: 0, btch: 1 usd: 0 [ 2125.234843] Node 0 DMA32 per-cpu: [ 2125.234845] CPU 0: hi: 186, btch: 31 usd: 140 [ 2125.234847] Node 0 Normal per-cpu: [ 2125.234848] CPU 0: hi: 186, btch: 31 usd: 48 [ 2125.234852] active_anon:39107 inactive_anon:120 isolated_anon:0 [ 2125.234853] active_file:3002 inactive_file:2430 isolated_file:0 [ 2125.234853] unevictable:0 dirty:0 writeback:0 unstable:0 [ 2125.234854] free:6351 slab_reclaimable:9460 slab_unreclaimable:47421 [ 2125.234855] mapped:2900 shmem:146 pagetables:1273 bounce:0 [ 2125.234856] Node 0 DMA free:15872kB min:24kB low:28kB high:36kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15684kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:4kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234864] lowmem_reserve[]: 0 3000 5020 5020 [ 2125.234867] Node 0 DMA32 free:9532kB min:5408kB low:6760kB high:8112kB active_anon:7964kB inactive_anon:0kB active_file:0kB inactive_file:72kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3072160kB mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB slab_reclaimable:4324kB slab_unreclaimable:22084kB kernel_stack:0kB pagetables:44kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234874] lowmem_reserve[]: 0 0 2020 2020 [ 2125.234884] Node 0 Normal free:0kB min:3640kB low:4548kB high:5460kB active_anon:148464kB inactive_anon:480kB active_file:12008kB inactive_file:9648kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2068480kB mlocked:0kB dirty:0kB writeback:0kB mapped:11596kB shmem:584kB slab_reclaimable:33516kB slab_unreclaimable:167596kB kernel_stack:4832kB pagetables:5048kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 2125.234891] lowmem_reserve[]: 0 0 0 0 [ 2125.234894] Node 0 DMA: 0_4kB 0_8kB 0_16kB 0_32kB 2_64kB 1_128kB 1_256kB 0_512kB 1_1024kB 1_2048kB 3_4096kB = 15872kB [ 2125.234901] Node 0 DMA32: 27_4kB 84_8kB 103_16kB 24_32kB 3_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 1_2048kB 1_4096kB = 9532kB [ 2125.234908] Node 0 Normal: 0_4kB 0_8kB 0_16kB 0_32kB 0_64kB 0_128kB 0_256kB 0_512kB 0_1024kB 0_2048kB 0_4096kB = 0kB [ 2125.234914] 5578 total pagecache pages [ 2125.234916] 0 pages in swap cache [ 2125.234917] Swap cache stats: add 0, delete 0, find 0/0 [ 2125.234918] Free swap = 0kB [ 2125.234919] Total swap = 0kB [ 2125.255600] 1310704 pages RAM [ 2125.255603] 37445 pages reserved [ 2125.255604] 17402 pages shared [ 2125.255605] 1181257 pages non-shared [ 2125.255609] SLUB: Unable to allocate memory on node -1 (gfp=0x20) [ 2125.255612] cache: kmalloc-256, object size: 256, buffer size: 256, default order: 0, min order: 0 [ 2125.255614] node 0: slabs: 124, objs: 1984, free: 0 [ 2132.944060] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2132.944284] IP: [] balance_pgdat+0x516/0x6e0 [ 2132.944513] PGD 13307d067 PUD 155da4067 PMD 0 [ 2132.944584] Oops: 0002 [#1] SMP [ 2132.944633] last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/host2/target2:0:2/2:0:2:0/block/sdc/queue/scheduler [ 2132.944781] CPU 0 [ 2132.944809] Modules linked in: zfs(P) zcommon(P) znvpair(P) zavl(P) zunicode(P) spl zlib_deflate ppdev nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc psmouse vmw_balloon serio_raw parport_pc i2c_piix4 shpchp lp parport mptspi floppy e1000 mptscsih mptbase [ 2132.945198] [ 2132.945220] Pid: 23, comm: kswapd0 Tainted: P 2.6.38-11-server #50-Ubuntu VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform [ 2132.945544] RIP: 0010:[] [] balance_pgdat+0x516/0x6e0 [ 2132.945664] RSP: 0018:ffff880174707d90 EFLAGS: 00010282 [ 2132.945737] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000bf6 [ 2132.945837] RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000ae0 [ 2132.945935] RBP: ffff880174707e80 R08: 0000000000000000 R09: 0000000000000100 [ 2132.946190] R10: 0000000000000004 R11: 0000000000000000 R12: 0000000000000000 [ 2132.946383] R13: ffff88017fffb000 R14: ffff88017fffb000 R15: 0000000000000002 [ 2132.946606] FS: 0000000000000000(0000) GS:ffff8800bfc00000(0000) knlGS:0000000000000000 [ 2132.946894] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 2132.947061] CR2: 0000000000000000 CR3: 0000000155da5000 CR4: 00000000000006f0 [ 2132.947284] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2132.947490] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 2132.947677] Process kswapd0 (pid: 23, threadinfo ffff880174706000, task ffff88017470adc0) [ 2132.947964] Stack: [ 2132.948079] ffff880174707da0 ffff88017470adc0 ffff880174707dc0 ffff880174707e34 [ 2132.948474] ffff880174707eac 0000000000000000 0000000000001454 0000000000000000 [ 2132.948759] 0000000100000000 ffff880174707e40 0000000000000000 000000007fffb000 [ 2132.949048] Call Trace: [ 2132.949173] [] kswapd+0xea/0x1d0 [ 2132.949329] [] ? kswapd+0x0/0x1d0 [ 2132.949486] [] kthread+0x96/0xa0 [ 2132.949642] [] kernel_thread_helper+0x4/0x10 [ 2132.949812] [] ? kthread+0x0/0xa0 [ 2132.949969] [] ? kernel_thread_helper+0x0/0x10 [ 2132.950157] Code: ea ff ff e9 82 fc ff ff 0f 1f 44 00 00 8b bd 74 ff ff ff 48 8d 55 80 4c 89 f6 e8 06 ee ff ff 48 8b 95 38 ff ff ff be d0 00 00 00

<48> c7 02 00 00 00 00 48 8b 95 40 ff ff ff 48 8b 7d 80 e8 a3 ce [ 2132.950800] RIP [] balance_pgdat+0x516/0x6e0 [ 2132.950972] RSP [ 2132.951107] CR2: 0000000000000000 [ 2132.956029] ---[ end trace 674593f315e64846 ]--- With swapon it hangs the system. With swapoff it seems it continues to work.
behlendorf commented 13 years ago

This is very likely caused by issue #287 which we are testing a fix for now.

tstudios commented 13 years ago

Excellent! Looking forward to both issues falling off the bug list.

Sam Allgood Turner Studios Atlanta, Georgia 30318

-----Original Message----- From: Brian Behlendorf [mailto:reply@reply.github.com] Sent: Wed 11/2/2011 4:42 PM To: Allgood, Sam Subject: Re: [zfs] Hang during rsync (#261)

This is very likely caused by issue #287 which we are testing a fix for now.

Reply to this email directly or view it on GitHub: https://github.com/zfsonlinux/zfs/issues/261#issuecomment-2609332

behlendorf commented 13 years ago

Closing issue, this should be fixed by the following two commits for #287.

zfsonlinux/zfs@6a95d0b74c2951f0dc82361ea279f64a7349f060 zfsonlinux/spl@b8b6e4c453929596b630fa1cca1ee26a532a2ab4

ioquatix commented 10 years ago

I seem to be having this issue. I was doing a moderately large Rsync, and had the timeouts described above.

Feb 14 03:00:53 hinoki kernel: INFO: task kswapd0:26 blocked for more than 120 seconds.
Feb 14 03:00:53 hinoki kernel:       Tainted: P           O 3.12.9-2-ARCH #1
Feb 14 03:00:54 hinoki kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 14 03:00:54 hinoki kernel: kswapd0         D 0000000000000246     0    26      2 0x00000000
Feb 14 03:00:54 hinoki kernel:  ffff880223e47a90 0000000000000046 00000000000144c0 ffff880223e47fd8
Feb 14 03:00:55 hinoki kernel:  ffff880223e47fd8 00000000000144c0 ffff8802241622c0 ffff880223e479f8
Feb 14 03:00:55 hinoki kernel:  ffffffffa0304896 ffff8801185c8830 ffff8802082d5a00 ffff880053f7aa80
Feb 14 03:00:55 hinoki kernel: Call Trace:
Feb 14 03:00:56 hinoki kernel:  [<ffffffffa0304896>] ? dnode_setdirty+0x126/0x170 [zfs]
Feb 14 03:00:56 hinoki kernel:  [<ffffffffa02ed7ed>] ? dbuf_dirty+0x44d/0x920 [zfs]
Feb 14 03:00:56 hinoki kernel:  [<ffffffff814f26a9>] schedule+0x29/0x70
Feb 14 03:00:56 hinoki kernel:  [<ffffffff814f2ae3>] schedule_preempt_disabled+0x23/0x30
Feb 14 03:00:57 hinoki kernel:  [<ffffffff814f1018>] __mutex_lock_slowpath+0x158/0x3d0
Feb 14 03:00:57 hinoki kernel:  [<ffffffff814f12a2>] mutex_lock+0x12/0x30
Feb 14 03:00:57 hinoki kernel:  [<ffffffffa037fded>] zfs_zinactive+0x6d/0x150 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa0378cd4>] zfs_inactive+0x64/0x210 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811452f9>] ? truncate_pagecache+0x59/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffffa038e924>] zpl_evict_inode+0x24/0x30 [zfs]
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811befe0>] evict+0xb0/0x1b0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811bf119>] dispose_list+0x39/0x50
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811c00d7>] prune_icache_sb+0x47/0x60
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811a839c>] super_cache_scan+0xfc/0x170
Feb 14 03:00:58 hinoki kernel:  [<ffffffff811478ef>] shrink_slab+0x1df/0x3e0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b5a6>] balance_pgdat+0x3d6/0x5f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b929>] kswapd+0x169/0x480
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81085ce0>] ? wake_up_atomic_t+0x30/0x30
Feb 14 03:00:58 hinoki kernel:  [<ffffffff8114b7c0>] ? balance_pgdat+0x5f0/0x5f0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084ec0>] kthread+0xc0/0xd0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120
Feb 14 03:00:58 hinoki kernel:  [<ffffffff814fbb3c>] ret_from_fork+0x7c/0xb0
Feb 14 03:00:58 hinoki kernel:  [<ffffffff81084e00>] ? kthread_create_on_node+0x120/0x120

Is this relevant or is there another bug I should checking?

Thanks, Samuel

behlendorf commented 10 years ago

@ioquatix This looks different. If you're seeing this in the master code go ahead and open a new issue.