stec-inc / EnhanceIO

EnhanceIO Open Source for Linux
Other
420 stars 177 forks source link

After unclean shutdown, cache is started after fsck leading to corrupted FS #30

Open CoRpO opened 11 years ago

CoRpO commented 11 years ago

Hi,

While filling the server in writeback mode, and serving a few files, kernel panic occured. I don't know if it is related to eio or not (no logs), but the recovery after reboot failed.

The filesystem (on /home) was auto-checked before eio cache was up, leading to a corrupted FS and unmountable /home. Worse, the cache didn't come up as it tried to mount as readonly but there was dirty data.

Mar 21 14:08:26 ss2 kernel: [    5.738754] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
Mar 21 14:08:26 ss2 kernel: [    5.894272] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro
Mar 21 14:08:26 ss2 kernel: [    5.953532] SGI XFS with ACLs, security attributes, realtime, large block/inode numbers, no debug enabled
Mar 21 14:08:26 ss2 kernel: [    5.954556] XFS (sdc1): Mounting Filesystem
Mar 21 14:08:26 ss2 kernel: [    6.191279] XFS (sdc1): Starting recovery (logdev: internal)
Mar 21 14:08:26 ss2 kernel: [    6.237324] XFS (sdc1): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xffffc90017c11800, dino bp = 0xffff88090fd61380, ino = 103088659608
Mar 21 14:08:26 ss2 kernel: [    6.237424] XFS (sdc1): Internal error xlog_recover_inode_pass2(1) at line 2266 of file fs/xfs/xfs_log_recover.c.  Caller 0xffffffffa017556f
Mar 21 14:08:26 ss2 kernel: [    6.238047] XFS (sdc1): log mount/recovery failed: error 117
Mar 21 14:08:26 ss2 kernel: [    6.238151] XFS (sdc1): log mount failed
Mar 21 14:08:26 ss2 kernel: [    7.189661] register_policy: policy 1 added
Mar 21 14:08:26 ss2 kernel: [    7.192931] register_policy: policy 2 added
                                           enhanceio: Unclean shutdown detected
                                           enhanceio: Only dirty blocks exist in cacheget_policy: policy 2 found
                                           enhanceio_lru: eio_lru_instance_init: created
Mar 21 14:08:26 ss2 kernel: [   22.861980] enhanceio: md_load: Cannot use read only mode because dirty data exists in the cache
Mar 21 14:08:26 ss2 kernel: [   22.862067] enhanceio: Cache metadata loaded from disk with 35816421 valid 35816421 dirty blocks
                                           enhanceio: md_load: Cannot use read only mode because dirty data exists in the cache
Mar 21 14:08:26 ss2 kernel: [   22.862487] enhanceio: Cache metadata loaded from disk with 35816421 valid 35816421 dirty blocks<3>
                            [   22.876348] enhanceio: Cache creation failed: Failed to reload cache.
Mar 21 14:08:26 ss2 kernel: [   22.876688] enhanceio: Cache creation failed: Failed to reload cache.

# eio_cli info => no caches found

Re-enabled the cache and mounted the FS to allow log replay

# eio_cli enable (with good params)

Mar 21 14:09:53 ss2 kernel: [  115.202069] enhanceio: Unclean shutdown detected
Mar 21 14:09:53 ss2 kernel: [  115.202073] enhanceio: Only dirty blocks exist in cacheget_policy: policy 2 found
Mar 21 14:09:53 ss2 kernel: [  115.202082] enhanceio_lru: eio_lru_instance_init: created new instance of LRUenhanceio: Setting replacement policy to lru (2)
Mar 21 14:10:01 ss2 kernel: [  115.202088] enhanceio: Allocate 456042KB (4B per) mem for 116746752-entry cache (capacity:457823MB, associativity:256, block size:4096 bytes)
                                           enhanceio: Cache metadata loaded from disk with 3581
Mar 21 14:11:02 ss2 kernel: [  123.697241] enhanceio: Setting mode to write back <5>
                            [  184.689270] XFS (sdc1): Mounting Filesystem
Mar 21 14:11:03 ss2 kernel: [  184.981506] XFS (sdc1): Starting recovery (logdev: internal)
Mar 21 14:11:03 ss2 kernel: [  185.743178] XFS (sdc1): Ending recovery (logdev: internal)
Mar 21 14:38:55 ss2 kernel: [ 1857.380497] XFS (sdc1): Mounting Filesystem
Mar 21 14:38:55 ss2 kernel: [ 1857.521968] XFS (sdc1): Ending clean mount
Mar 21 14:41:26 ss2 kernel: [  123.709083] enhanceio_lru: Initialized 456042 sets in LRU
Mar 21 14:41:34 ss2 kernel: [ 2008.743648] enhanceio: Writing out metadata to cache device. Please wait...
                                           enhanceio: Metadata saved on the cache device
Mar 21 14:44:50 ss2 kernel: [ 2016.202840] enhanceio: Valid blocks: 35817056, Dirty blocks: 0, Metadata sectors: 3648496
                                           enhanceio: Setting mode to read only
Mar 21 14:44:50 ss2 kernel: [ 2212.156982] get_policy: policy 2 foundenhanceio_lru: eio_lru_instance_init: created new instance of LRU
Mar 21 14:44:58 ss2 kernel: [ 2212.156988] enhanceio: Setting replacement policy to lru (2)enhanceio: Allocate 456042KB (4B per) mem for 116746752-entry cache (capacity:457823MB, associativity:256, block size:4096 bytes)
Mar 21 14:46:55 ss2 kernel: [ 2219.929048] enhanceio_lru: Initialized 456042 sets in LRU<5>
                            [ 2337.583123] XFS (sdc1): Mounting Filesystem
Mar 21 14:46:55 ss2 kernel: [ 2337.816779] XFS (sdc1): Ending clean mount

after xfs_repair, several files were damaged. Flushed the cache (put in readonly mode, flushing 300 GB of dirty data) and disabled write barrier mount option of /home

EnhanceIO should bring cache up before fsck can happen. Or maybe it can be circumvented with a fstab option which should be documented.

Thanks

sanoj-stec commented 11 years ago

seems like the cache was created in read only mode and edited to write back mode followed by a reboot Since the udev rule was file not updated, upon reboot enhanceio would attempt ot recreate cache in readonly mode and thus fail.

The old udev script did not update udev rule file on cache edit This was a bug in the udev script, it has been fixed in 6b09be22bc42722a34193041c447e169382f08a8

CoRpO commented 11 years ago

Okay for the udev part, but what about the fsck kicking in before eio starts the caching device ?

sanoj-stec commented 11 years ago

This is a bug If the source device comes up and ssd has not yet comes up, we set the source device to read only to prevent corruption of source device. We need to reset the mode to read write once the ssd has come up (this is missing)

The scenario in your log seems to be :- 1) source device comes up ---> blockdev--setro changes it to read only mode 2) device is mounted ( fsck happens since the cache had dirty data) 3) ssd comes up and cache create is fired.

As you said, we need to add -noauto flag in fstab to prevent auto mount of our source device before cache has come up

Thanks

blackhand commented 11 years ago

One workaround for me is that use a /boot partition and write init scripts in your initrd that enable (eio_cli enable) the cache before the device get mounted, i have my / (root) partition with writeback mode, and this works for me (/boot and / partitions formated with ext4)

In my particular case (Archlinux) i create a new enhanceio hook for initrd, that enable the cache after the block devices are detected BUT before the partitions get mounted. i use fsck in my initrd too. the hook is very heavy because i need to put all the python interpreter in the initrd with the blockdev tool, but works.

sgaltsev commented 10 years ago

There's got to be a safeguard against filesystem corruption. Blaming udev for that is lame. Using software package should not be like navigating the minefield.

mc-comanescu commented 10 years ago

A couple of things can be done:

  1. the order of files in fstab is important. Put this towards the bottom so it gets mounted last.
  2. the 6th parameter specifies the fsck order. Thus 1 should be for /, 2 for /home.. and so on.. if you don't want automatic fsck on a particular filesystem, then set this parameter to 0 ( zero).
  3. device paths/names are no good since they can be remapped on the fly, however UUIDs are persistent, since they are embeded on the devices itself ( in the table). use lsblk and blkid to replace your /dev/foo1 with UUID="aaaa-bbbb-cccc-ddddd" in fstab, for a more predictable boot order. Lastly, once you modify fstab, run the ramdisk generator since a copy of fstab might be added to your initrd image. ( to account for special cases, such as boot from raid, nfsboots, etc..)

I will try to see if I can add some code to help with this one of these days.