Zram Block IO error Linux 3.6

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?

1. Compile kernel with zram driver either as a module or built in.
2. Use the dd command to write data to it - in my case 128MB
3.

What version of the product are you using? On what operating system?

linux 3.6-rc1 to 3.6-rc4

Please provide any additional information below.

Kernel compiled with zram driver from staging both as a
module and built in.

Original issue reported on code.google.com by viech...@gmail.com on 3 Sep 2012 at 5:44

Attachments:

kernel-zram-oops.txt

GoogleCodeExporter commented 9 years ago

I hit this too, changing line 262 of zram_drv.c from:

cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_RO);

to:

cmem = zs_map_object(zram->mem_pool, handle, ZS_MM_RW);

Seems to have fixed it.  I'm testing now, and will submit a patch to the 
maintainers if it works.  Note, I don't actually know that this is the correct 
fix, but it does solve my logfile spam.

Original comment by paer...@gmail.com on 2 Oct 2012 at 5:22

GoogleCodeExporter commented 9 years ago

Nope, didn't solve it, sorry.

Original comment by paer...@gmail.com on 2 Oct 2012 at 6:37

GoogleCodeExporter commented 9 years ago

Just as a continued FYI on this bug, here's my procedure to reproduce:

In initrd or in a booted no-initrd system, configure a few zram devices:

echo $((128 * 1024 * 1024)) > /sys/block/zram0/disksize
echo $((10 * 1024 * 1024 * 1024)) > /sys/block/zram1/disksize
echo $((1 * 1024 * 1024 * 1024)) > /sys/block/zram2/disksize

mkfs.ext4 -O dir_nlink,extent,extra_isize,flex_bg,^has_journal,uninit_bg -m0 -b 
4096 -L "zram0" /dev/zram0
mkfs.ext4 -O dir_nlink,extent,extra_isize,flex_bg,^has_journal,uninit_bg -m0 -b 
4096 -L "zram1" /dev/zram1
mkfs.ext4 -O dir_nlink,extent,extra_isize,flex_bg,^has_journal,uninit_bg -m0 -b 
4096 -L "zram2" /dev/zram2

Then mount the filesystem:

cd /mnt
mount /dev/zram1 floppy
cd floppy

Finally crank some I/O:

dd if=/dev/urandom of=a count=1000000

This produces output similar to this in dmesg: (after a few secs)

[ 6170.383170] zram: Error allocating memory for compressed page: 53091, size=41
16
[ 6170.383171] Buffer I/O error on device zram1, logical block 53091
.....<snip 29 similar errors>.....
[ 6170.383216] Buffer I/O error on device zram1, logical block 53121
[ 6170.383219] EXT4-fs warning (device zram1): ext4_end_bio:250: I/O error 
writing to inode 12 (offset 74854400 size 131072 starting block 53091)

Original comment by paer...@gmail.com on 3 Oct 2012 at 1:15

GoogleCodeExporter commented 9 years ago

Blarg, hate to comment spam...

It only happens if the I/O after compression ends up needing a block larger 
than 4k:  By lowering my bs (binary search between 4096 and 4030), around 4072 
bytes of random data may or may not be able to compress enough to not trigger 
the error.  Did, perhaps, upstream ratelimit high order memory allocations?

All of the errors are prefaced with:

[ 7181.454451] zram: Error allocating memory for compressed page: 34832, 
size=4097

Having a size >4k

Original comment by paer...@gmail.com on 3 Oct 2012 at 1:32

GoogleCodeExporter commented 9 years ago

I got same issue here.
I think it's because this patch: [PATCH] zram: remove special handle of 
uncompressed page
https://lkml.org/lkml/2012/6/8/116
point 3 => zsmalloc can't handle bigger size than PAGE_SIZE so zram can't do it 
any more without redesign

it remove the code for handle the bigger size than PAGE_SIZE (compare to kernel 
3.5).

Original comment by wu.to...@gmail.com on 3 Oct 2012 at 3:55

GoogleCodeExporter commented 9 years ago

Thanks Wu, that patch is indeed the root cause. It tries to use zsmalloc even 
for sizes > PAGE_SIZE which is not allowed. I will fix it soon.

Original comment by nitingupta910@gmail.com on 3 Oct 2012 at 4:57

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Can you please try the patch attached?

Original comment by nitingupta910@gmail.com on 5 Oct 2012 at 5:13

Attachments:

zram_pagealloc_fix.patch

GoogleCodeExporter commented 9 years ago

The patch work fine for me. Thanks.

Original comment by wu.to...@gmail.com on 5 Oct 2012 at 3:33

GoogleCodeExporter commented 9 years ago

No more errors it seems after applying the patch to 3.6.1.

Original comment by mich...@zugelder.org on 7 Oct 2012 at 9:40

GoogleCodeExporter commented 9 years ago

Any plans for pushing this upstream?  I figured it would have shown up in 
either linus' tree or gregkh's stable tree by now.

Original comment by paer...@gmail.com on 10 Oct 2012 at 4:07

GoogleCodeExporter commented 9 years ago

@paerley: I have sent it to lkml for review and should be merged sometime soon.

Original comment by nitingupta910@gmail.com on 11 Oct 2012 at 12:49

GoogleCodeExporter commented 9 years ago

Should be merged into staging soon (sent patch gregkh). Closing the issue.

Original comment by nitingupta910@gmail.com on 11 Oct 2012 at 6:47

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

i applied this patch and add zram as l2arc to a zfs pool, which results in alot 
of l2arc checksum error. 
this suggests zram is corrupting data.
Maybe we should use PAGE_SIZE+1 to indicate uncompressed pages?

Original comment by DRDarkRa...@gmail.com on 16 Oct 2012 at 5:50

GoogleCodeExporter commented 9 years ago

@DRDarkRaven: I found a bug which could cause this corruption. Can you please 
try the patch attached? Thanks.

Original comment by nitingupta910@gmail.com on 17 Oct 2012 at 5:15

Attachments:

zram_pagealloc_fix_v2.patch

GoogleCodeExporter commented 9 years ago

Reopening the bug (though I could not reproduce the corruption myself)

Original comment by nitingupta910@gmail.com on 17 Oct 2012 at 5:16

Changed state: Started

GoogleCodeExporter commented 9 years ago

@DRDarkRaven: can you please verify if the patch provided in comment #14 works? 
Also, what's the kernel version you are using?

Original comment by nitingupta910@gmail.com on 19 Oct 2012 at 9:35

GoogleCodeExporter commented 9 years ago

@nitingupta910: I applied your v2 patch, and I don't see any more of the "zram: 
Error allocating memory for compressed page:" kind of errors. I am using /tmp 
on zram.

Original comment by ppu...@gmail.com on 10 Nov 2012 at 9:36

GoogleCodeExporter commented 9 years ago

does this bug cause data loss? i ask because i have a server i'd rather not 
reboot that ran a few hours getting this error. i have since turned off 
compcache and nothing (no other processes) seems unhappy - nothing has crashed. 
thank you.

Original comment by a...@cichlid.com on 14 Nov 2012 at 3:14

GoogleCodeExporter commented 9 years ago

Apparently, zram_pagealloc_fix.patch was introduced in Kernel 3.6 at some 
point. Using /dev/zram0 as swap device (which was no problem in earlier kernel 
releases) under the kernel 3.6 series up to 3.6.8, I get a completely 
unrecoverable system freeze when allocating swap by filling up the ramdisk (dd 
if=/dev/zero of=/tmp/zero.img bs=1M count=800).
Reverting the patch and patching with zram_pagealloc_fix_v2.patch instead fixes 
the problem for me.

Original comment by knopp...@googlemail.com on 28 Nov 2012 at 11:20

GoogleCodeExporter commented 9 years ago

@knopperk: Most probably you are hitting this bug:
https://bugzilla.kernel.org/show_bug.cgi?id=50081

The fix has been posted to lkml and is under review. Should be in mainline soon.

Original comment by nitingupta910@gmail.com on 29 Nov 2012 at 9:31

tthtlc / compcache

Zram Block IO error Linux 3.6 #102