Open ghost opened 8 years ago
Same issue here with debian pkg 3.10.2+dfsg-1 and I can't write any files.
Sep 25 15:22:33 clbk-master mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:22:33 clbk-master mfsmaster[15936]: mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:22:43 clbk-master mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:22:43 clbk-master mfsmaster[15936]: mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:22:53 clbk-master mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:22:53 clbk-master mfsmaster[15936]: mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:23:03 clbk-master mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:23:03 clbk-master mfsmaster[15936]: mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:23:13 clbk-master mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:23:13 clbk-master mfsmaster[15936]: mfsmaster[15936]: serious structure inconsistency: (chunkid:00000000000000AD)
Sep 25 15:23:22 clbk-master mfsmaster[15936]: release: session not found
Sep 25 15:23:22 clbk-master mfsmaster[15936]: mfsmaster[15936]: release: session not found
Same error with 3.10.0+dfsg-1 , work great with 3.9.4+dfsg-5~bpo8+1
@nanowish On which version of debian you got the error?
All servers of the cluster are on Debian Jessie freshly installed
In my case: Arch Linux / LizardFS 3.10.2
I put bug label on this, but I cannot reproduce this on our lab. I need more details:
4 chunks 5 files, it's fresh install with 1 master, 2 chunkservers, 1 metalogger on chunk1 and cgi-srv on chunk2, and only 1 client connected all on debian 8.6 newly installed for the cluster
I tried again with 3.10.2 same error, and work great with version 3.9
patch for that is in our CR http://cr.skytechnology.pl:8081/#/c/2720/
I'm getting the same:
Apr 23 21:03:16 ale-XPS13 mfsmount[17493]: write file error, inode: 11, index: 0 - error sent by master server (Chunk lost) (try counter: 19)
Apr 23 21:03:26 ale-XPS13 mfsmaster[921]: mfsmaster[921]: serious structure inconsistency: (chunkid:000000000000000A)
Apr 23 21:03:26 ale-XPS13 mfsmaster[921]: serious structure inconsistency: (chunkid:000000000000000A)
Apr 23 21:03:26 ale-XPS13 mfsmount[17493]: write file error, inode: 11, index: 0 - error sent by master server (Chunk lost) (try counter: 20)
Apr 23 21:03:36 ale-XPS13 mfsmaster[921]: mfsmaster[921]: serious structure inconsistency: (chunkid:000000000000000A)
Apr 23 21:03:36 ale-XPS13 mfsmaster[921]: serious structure inconsistency: (chunkid:000000000000000A)
Apr 23 21:03:36 ale-XPS13 mfsmount[17493]: write file error, inode: 11, index: 0 - error sent by master server (Chunk lost) (try counter: 21)
Apr 23 21:03:47 ale-XPS13 mfsmaster[921]: mfsmaster[921]: serious structure inconsistency: (chunkid:000000000000000A)
Apr 23 21:03:47 ale-XPS13 mfsmaster[921]: serious structure inconsistency: (chunkid:000000000000000A)
Apr 23 21:03:47 ale-XPS13 mfsmount[17493]: write file error, inode: 11, index: 0 - error sent by master server (Chunk lost) (try counter: 22)
$ dpkg -l | grep lizard
ii lizardfs-adm 3.10.4+dfsg-3 amd64 LizardFS - administration tools
ii lizardfs-cgi 3.10.4+dfsg-3 all LizardFS - CGI monitor
ii lizardfs-cgiserv 3.10.4+dfsg-3 amd64 simple CGI-capable HTTP server to run LizardFS CGI monitor
ii lizardfs-chunkserver 3.10.4+dfsg-3 amd64 LizardFS - data server
ii lizardfs-client 3.10.4+dfsg-3 amd64 LizardFS - client tools and mount utility
ii lizardfs-common 3.10.4+dfsg-3 all LizardFS - common files
ii lizardfs-master 3.10.4+dfsg-3 amd64 LizardFS - master server
$ lsb_release -a
LSB Version: core-9.20160110ubuntu5-amd64:core-9.20160110ubuntu5-noarch:security-9.20160110ubuntu5-amd64:security-9.20160110ubuntu5-noarch
Distributor ID: Ubuntu
Description: Ubuntu 17.04
Release: 17.04
Codename: zesty
Any hint ? This wasn't supposed to be fixed in 3.10.4 ?
I would like to try Lizard, but I'm unable to create any file (or, better, seems that i'm unable to change files. I'm able to create a new file, but unable to make changes to it)
Apr 23 21:03:26 ale-XPS13 mfsmount[17493]: write file error, inode: 11, index: 0 - error sent by master server (Chunk lost) (try counter: 20)
This roughly means that master server has information about a chunk, but no file actually has that chunk. It is indeed a structure inconsistency and it's not easy to reproduce such a thing. One way is to load metadata from very outdated/very corrupt metadata file.
OK, so these logs appear when you try writing to a certain file. Please, aside from the logs, provide what is the output of lizardfs fileinfo PATH_TO_AFFECTED_FILE
. Second thing to do is to search chunk directories and see if these chunks are indeed missing. Something like find PATH_TO_CHUNKSERVER_HDD -name "*000000000000000A*"
. Finally, if it is a testing installation only and the files are not really confidential, it would probably helped if you provided affected metadata.mfs file from /var/lib directory.
Finally, is it easy for you to reproduce this behaviour on a fresh installation? If so, can you provide steps to do that?
Also, we'll try if perhaps this kind of thing is reproducible on Ubuntu 17 only, maybe it's some library incosistency.
Yes, i'm able to reproduce this behaviour on a fresh install. This is a test "cluster" (1 chunkserver) where i'm trying to create a huge blank file (500MB) and then creating a filesystem inside of it.
The blank file is created properly, everything else will fail.
x@ale-XPS13:/mnt/lizard dd if=/dev/zero of=test.img bs=1M count=500
500+0 record in
500+0 record out
524288000 bytes (524 MB, 500 MiB) copied, 3,29313 s, 159 MB/s
x@ale-XPS13:/mnt/lizard$ lizardfs fileinfo test.img
test.img:
chunk 0: 000000000000001B_00000001 / (id:27 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 1: 000000000000001C_00000001 / (id:28 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 2: 000000000000001D_00000001 / (id:29 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 3: 000000000000001E_00000001 / (id:30 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 4: 000000000000001F_00000001 / (id:31 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 5: 0000000000000020_00000001 / (id:32 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 6: 0000000000000021_00000001 / (id:33 ver:1)
copy 1: 192.168.1.84:9422:_
chunk 7: 0000000000000022_00000001 / (id:34 ver:1)
copy 1: 192.168.1.84:9422:_
Then, trying to run mkfs.ext4 test.img
it start to log the corruption messages.
root@ale-XPS13:/tmp/chunk1# find -name "*000000000000001B*"
./chunks00/chunk_000000000000001B_00000001.mfs
For metadata, do you have an email where I can send it ?
Go with support@lizardfs.com, someone will check it tomorrow during office hours.
Mail sent. Meanwhile, no other test to run ? I would like to test Lizard....
Wait, i've double checked. After master timeout the operation :
Apr 24 20:53:07 ale-XPS13 mfsmount[6360]: write file error, inode: 3, index: 0 - error sent by master server (Chunk lost) (try counter: 30)
Apr 24 20:53:07 ale-XPS13 mfsmount[6360]: error writing file number 3: EIO (Input/output error)
chunks were totally removed:
# lizardfs fileinfo /mnt/lizard/test.img
/mnt/lizard/test.img:
/mnt/lizard/test.img [0]: No such chunk
# find /tmp/ -name "*000000000000001B*"
root@ale-XPS13:/tmp#
https://github.com/lizardfs/lizardfs/issues/526 <- This issue is also related to wrong code generated by new GCC. Ubuntu 17.04 is new, so it has this high GCC version as well. Fix was added here: https://github.com/lizardfs/lizardfs/commit/422175eb1aa8c2a1e4d0727ee3fe8190e1ae1340
Can you check if this is solves the case?
*and it is solved in 3.10.6, but this patch can be cherry-picked to 3.10.4 without conflict.
I have to manually build lizard? With 3.10.6 from you repo, I have to replace all paths and user permissions
Isn't possible to push the 3.10.6 to official Ubuntu repo? Because the version coming with Ubuntu is useless and not working at all
I have literally no power over official Ubuntu repos :)
Also, unification of paths and permissions is coming, but it's not easy to predict all paths to avoid potential problems with LizardFS upgrades, so it will take time to prepare these patches.
I'll try to build a package from Ubuntu sources by cherry picking this patch. Should I build only the master ?
Master is enough for this patch to work.
rebuilt the package by cherry-picking the patch. Now is working.
I tried to configure a fresh LizardFS install (in Arch Linux, LizardFS version 3.10.2) and I got this error:
I've notice this before the error below:
And also:
Any idea about what caused that issue?
Thank you.
PS: I reinstalled LizardFS again and i got the same issue :( . (FYI LizardFS was compiled on Arch Linux with this PKGBUILD: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=lizardfs )