Closed chrisrd closed 10 years ago
@chrisrd Could you run zdb -dddd <pool>/<fs> <inode_number>
on a few of the bad ones. I'd be interested to know the dnode flags and whether zdb has any success in decoding the SA xattr.
@chrisrd Thanks for the detailed debugging, I'd also be very curious in the zdb
output. I suspect it may be able to decode the xattr because the stack you posted suggests an bug in the spl's xdr implementation. Specifically in xdr_int()
. A different implementation is used in the library which may not suffer from this.
Zdb output below.
I had this occur whilst trying to dump:
# time zdb -dddd pool/corrupt 528903
zdb: can't open 'pool/corrupt': Input/output error
real 2m55.939s
I then tried to do exactly the same under strace, and it succeeded. And running it again without strace also succeeded. The other dumps each took around 1m30 - 1m45, so the time taken for the failed dump was also an outlier. There were no kernel messages.
# zdb -dddd pool/corrupt 48147
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
48147 1 16K 14.0K 11.5K 14.0K 100.00 ZFS plain file
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /c/xxxxxx/zzzzzzzzzzzzz/aaaaaaaaaaaaa/bbbbbbbbbbb/ccccccccccccccccccccccccccccc.xlsx
uid 1068
gid 1000
atime Thu Dec 12 12:38:52 2013
mtime Fri Aug 23 11:46:46 2013
ctime Thu Dec 12 12:38:52 2013
crtime Thu Dec 12 12:38:52 2013
gen 260973
mode 100676
size 14174
parent 47114
links 1
pflags 40800000004
# zdb -dddd pool/corrupt 1126091
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
1126091 1 16K 512 1.50K 512 100.00 ZFS plain file
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /c/xxxxxx/yyyyy/ddd/eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee/ffff/gggggggg.dwl
uid 1068
gid 1000
atime Thu Dec 12 14:59:42 2013
mtime Thu Aug 29 08:45:09 2013
ctime Thu Dec 12 14:59:42 2013
crtime Thu Dec 12 14:59:42 2013
gen 264669
mode 100676
size 52
parent 1125032
links 1
pflags 40800000004
# zdb -dddd pool/corrupt 528903
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
528903 1 16K 512 1.50K 512 100.00 ZFS plain file
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /c/xxxxxx/yyyyy/hhhhhhhhhhh/iiiiiiiiiiiiiiiiiiiiii/jjjjjjjjjjjjj/kkkkkkkkkkk/llllllll.dwl2
uid 1068
gid 1000
atime Thu Dec 12 14:32:28 2013
mtime Tue Aug 13 13:54:12 2013
ctime Thu Dec 12 14:32:28 2013
crtime Thu Dec 12 14:32:28 2013
gen 263787
mode 100676
size 201
parent 525019
links 1
pflags 40800000004
c.f. non-corrupt:
# zdb -dddd pool/corrupt 1170239
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
1170239 2 16K 128K 705K 768K 100.00 ZFS plain file
256 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 5
path /c/xxxxxx/yyyyy/mmmmmmmmmmmmmmm/nnnnnnnnnnnnnnn/oooooooooooooo/ppppppp/qqqqqqqqq/rrrrrrrrrrrrrrrrrrrrrrrrrrr.pdf
uid 1068
gid 1000
atime Thu Dec 12 15:12:42 2013
mtime Wed Oct 9 12:26:00 2013
ctime Thu Dec 12 15:12:42 2013
crtime Thu Dec 12 15:12:42 2013
gen 265032
mode 100666
size 723562
parent 1168977
links 1
pflags 40800000004
SA xattrs: 80 bytes, 1 entries
user.rsync.%stat = 100666 0,0 65534:65534
I hadn't noticed before, but looking at the inode numbers shows they come in groups: I imagine that's because they're allocated in sequential order (are they?) and so the groups are time-correlated. Perhaps this relates to activity on the system when they were being created, e.g. a higher than normal load (the box is generally relatively lightly loaded). If so, this may be pointing to some failure under memory pressure, or a locking problem?
All corrupt inodes:
48147 48882 48885 48886 48887 48888 48889 49829 49835 49836 49840 51000 51001
51002 62052 62157 62342 62343 62347 63175
511698 511699 528084 528085 528088 528089 528092 528093 528096 528097 528100
528101 528902 528903
1126091 1126092 1128335 1129232 1129233 1129700 1129701 1130283 1130284 1133232
1133233 1133609 1133610 1143125 1143394 1143395 1145824 1145825 1157664 1157665
1158676 1158677 1159241 1159242 1160401 1160402 1160906 1160907 1163982 1163983
1167063 1167064 1167684 1167685 1167686 1167687 1167818 1167819 1169091 1169092
1170100 1170237 1170345 1170352 1170875 1170876 1177353 1177873 1181391 1181392
1181624 1181625 1182908 1182909
Also worthy of note, I recreated the entire filesystem last night in exactly the same manner as the original, and the new version doesn't have any xattr corruption.
@chrisrd Could you please try "zdb -dddddd" (6 d's) from dweeezil/zfs@803578e on some of the bad ones?
@chrisrd As a follow-on to my request, since it's not even trying to print any SA xattrs on the bad objects, the nvlist is likely corrupted and I'm hoping we can figure out the nature of the corruption by dumping its contents. Unfortunately, this simplistic debugging code assumes we are getting a reasonable length, but it's a start. One other useful piece of information from your zdb output is that we don't seem to have any spill blocks in use (which I'd not expect to see with the relatively short xattrs used by rsync).
It looks like all rsync ought to be doing with fake-super is a whole bunch of lsetxattr() calls. I do find it very interesting that you couldn't reproduce the problem after trying the exact same thing again.
As to the inode number, on a clean filesystem with xattr=sa, I'd expect them to be allocated pretty much sequentially. The fact that your bad objects were in clusters is interesting and likely relevant but it's not much to go on at this point.
Can you think of anything else your system may have been doing at the time when you ran the first rsync that created the corrupted filesystem. Also, was that first rsync writing to a brand new clean filesystem?
G'day @dweeezil - zdb -dddddd
output below, for the same inodes per previous.
Interestingly (scarily?!?), I got an error the first time dumping the first inode (and there was no other output):
zdb: ../../module/zfs/dsl_deadlist.c:103: Assertion `0 == dmu_bonus_hold(os, object, dl, &dl->dl_dbuf) (0x0 == 0x34)' failed.
The next dump of the same inode worked.
For each rsync the filesystem was freshly created just prior to the rsync (it's part of a script so it's consistent).
The box only receives these rsyncs, and runs ceph on some other disks (4 x osds, with the storage attached to VMs running on a separate box). The first (corrupted) rsync was running at 8am Sunday - there wasn't much else going on!
zdb -dddddd
output:
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
48147 1 16K 14.0K 11.5K 14.0K 100.00 ZFS plain file (K=inherit) (Z=inherit)
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /c/xxxxxx/aaaaaaaaaaaaa/bbbbbbbbbbbbb/ccccccccccc/ddddddddddddddddddddddddddddd.xlsx
uid 1068
gid 1000
atime Thu Dec 12 12:38:52 2013
mtime Fri Aug 23 11:46:46 2013
ctime Thu Dec 12 12:38:52 2013
crtime Thu Dec 12 12:38:52 2013
gen 260973
mode 100676
size 14174
parent 47114
links 1
pflags 40800000004
dump_znode_sa_xattr: sa_xattr_size=24 sa_size error=0
SA packed dump sa_xattr_size=24: \001\001\000\000\000\000\000\000\000\000\000\001\000\000\000\074\000\000\000\100\000\000\000\020
SA xattr dump:
dump_znode_sa_xattr: nvlist_unpack error=14
Indirect blocks:
0 L0 1:480119cb800:4000 3800L/2c00P F=1 B=260973/260973
segment [0000000000000000, 0000000000003800) size 14.0K
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
528903 1 16K 512 1.50K 512 100.00 ZFS plain file (K=inherit) (Z=inherit)
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /c/xxxxxx/eeeee/fffffffffff/gggggggggggggggggggggg/hhhhhhhhhhhhh/iiiiiiiiiiiiiiiiiiii.dwl2
uid 1068
gid 1000
atime Thu Dec 12 14:32:28 2013
mtime Tue Aug 13 13:54:12 2013
ctime Thu Dec 12 14:32:28 2013
crtime Thu Dec 12 14:32:28 2013
gen 263787
mode 100676
size 201
parent 525019
links 1
pflags 40800000004
dump_znode_sa_xattr: sa_xattr_size=24 sa_size error=0
SA packed dump sa_xattr_size=24: \001\001\000\000\000\000\000\000\000\000\000\001\000\000\000\074\000\000\000\100\000\000\000\020
SA xattr dump:
dump_znode_sa_xattr: nvlist_unpack error=14
Indirect blocks:
0 L0 1:4881970a000:800 200L/200P F=1 B=263787/263787
segment [0000000000000000, 0000000000000200) size 512
Dataset pool/corrupt [ZPL], ID 9742, cr_txg 260211, 235G, 822855 objects, rootbp DVA[0]=<2:5ba3cfed000:800> DVA[1]=<3:5db5f9bd000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=433336L/433336P fill=822855 cksum=1c78f55814:983527c54d9:1b62513eb4d86:37e8fdb1a6b035
Object lvl iblk dblk dsize lsize %full type
1126091 1 16K 512 1.50K 512 100.00 ZFS plain file (K=inherit) (Z=inherit)
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /c/xxxxxx/jjjjj/kkk/llllllllllllllllllllllllllllllllllll/mmmm/nnnnnnnn.dwl
uid 1068
gid 1000
atime Thu Dec 12 14:59:42 2013
mtime Thu Aug 29 08:45:09 2013
ctime Thu Dec 12 14:59:42 2013
crtime Thu Dec 12 14:59:42 2013
gen 264669
mode 100676
size 52
parent 1125032
links 1
pflags 40800000004
dump_znode_sa_xattr: sa_xattr_size=24 sa_size error=0
SA packed dump sa_xattr_size=24: \001\001\000\000\000\000\000\000\000\000\000\001\000\000\000\074\000\000\000\100\000\000\000\020
SA xattr dump:
dump_znode_sa_xattr: nvlist_unpack error=14
Indirect blocks:
0 L0 4:43ad7e7a800:800 200L/200P F=1 B=264669/264669
segment [0000000000000000, 0000000000000200) size 512
@chrisrd That's good information and shows that the xattr SA is being truncated in some manner. The part that is there appears to be totally correct but it's cut off where the actual string array should begin. I'll poke around the xattr SA code a bit to see whether I can identify any place where this might happen. Since you were able to perform the rsync again from scratch with no problems, however, it sounds like we're looking at a race condition or some other odd corner case.
One other interesting thing is that the two corrupted objects you looked at have 0676 perms which is a bit odd. I don't suppose that, by some chance, all the corrupted objects have that mode? There's no reason that a particular mode should cause this problem but it might be a clue that points us in the right direction.
@dweeezil Good pickup. All those files on the source, and only those files, that have mode 0676 or 0670 (find * -type f \( -perm 0676 -o -perm 0670 \)
) have the corrupt xattrs on the destination.
Note: the "source" files have been dumped onto an ext4 fs from external sources, using rsync --fake-super
, thus creating user.rsync.%stat
xattrs. These files are then copied to the ZFS filesystems using rsync -XX
to copy the user.rsync.%stat
xattrs (and any other xattrs that might have snuck in - but there aren't any others on the filesystems in question).
The plot thickens...
I started seeing corruption on the filesystem reported above as having been recreated and with no corruption afterwards. I thought I might have gone nuts somewhere along the line and reported the wrong thing somehow, so I once again recreated that filesystem (zfs create + rsync), along with 2 other filesystems also experiencing the corruption. After these filesystems were filled I checked for corruption (using getfattr -Rd *
) and found none.
Then, after a reboot, all of those filesystems are exhibiting corrupt xattrs!
I.e. something in memory is not making it to disk, or is only partly making it to disk.
Hmmm, that's ringing some bells... I vaguely recall some other issue where this was happening. @behlendorf - do you remember something like this?
More weirdness...
The first-reported filesystem is still exhibiting the same symptoms after the recreation and reboot, i.e. all and only source files with mode 0676 or 0670 have corrupted xattrs on the destination. One of the other filesystems is the same (except there are 9 source files with mode 0676 or 0670 without xattrs, so these files don't have any xattrs on the destination to be corrupt). For the remaining filesystem, it's all and only files with mode 0650 that are corrupt, and files with 0676 or 0670 aren't corrupt!
It doesn't appear to be the length of the xattr. I investigated one of the filesystems (with mode 0676 or 0670 corruptions) and found (counting the entire length of name and value dumped by getfattr -d
, e.g. user.rsync.%stat="100676 0,0 1003:104" == 38):
Corrupt xattrs (length on source)
--------------
Count Length
3 34
1380 38
Non-corrupt xattrs (length on source)
--------------
Count Length
943 33
168 34
472 35
75856 38
1 39
Where to from here??
@chrisrd That's all very good information. After noticing the file mode issue, I did actually try recreating the problem by hand but wasn't able to. I think the next step is to strace rsync to find its exact sequence of system calls involving the creation of the file, the setting of the xattr and any other inode-changing operations such as chmod, etc. I gather the first rsync (with fake-super) is running as a non-root user. Is the second (-XX) rsync running as root?
Your observation about seeing the corruption following a reboot certainly indicates cached-but-not-flushed metadata. You could check the output of dbufstat.py -bx
or one of its other forms and see what's being cached (look for objects matching the bad inodes).
I'm going to try to reproduce this today.
@dweeezil Yup, first rsync is non-root, second is root.
I removed one of the files ("UPS.txt") with the corruption, then straced an rsync of just that file using the same args as for the full set (-aH --delete --numeric-ids -XX --inplace
). I think the relevant part of the strace (egrep 'UPS\.txt|chmod|xattr|stat|open'
- let me know if you'd like more) is:
10604 01:53:41.340033 open("UPS.txt", O_RDONLY) = -1 ENOENT (No such file or directory) <0.000077>
10604 01:53:41.340159 open("UPS.txt", O_WRONLY|O_CREAT, 0600) = 3 <0.000248>
10604 01:53:41.340787 close(3) = 0 <0.000010>
10604 01:53:41.340835 lstat("UPS.txt", {st_mode=S_IFREG|0600, st_size=52, ...}) = 0 <0.000011>
10604 01:53:41.340898 llistxattr("UPS.txt", 0xc0a410, 1024) = 0 <0.000013>
10604 01:53:41.340938 lsetxattr("UPS.txt", "user.rsync.%stat", "100650 0,0 43874288:43874288", 28, 0) = 0 <0.000037>
10604 01:53:41.341007 utimensat(AT_FDCWD, "UPS.txt", {UTIME_NOW, {1295479844, 0}}, AT_SYMLINK_NOFOLLOW) = 0 <0.000016>
10604 01:53:41.341051 lchown("UPS.txt", 1079, 1000) = 0 <0.000019>
10604 01:53:41.341092 chmod("UPS.txt", 0650) = 0 <0.000023>
OK, looks like I can consistently reproduce the problem with that file. I'm not too sure what to look at in the dbufstat.py -bx
output, but here's a record of removing the file, recreating it with rsync, seeing the good xattr, re-mounting the filesystem and seeing a corrupt attr.
...actually, the zdb before the remount shows it's already stuffed before the remount.
# f="/path/UPS.txt"
# rm "${f}"
# rsync -aH --delete --numeric-ids -XX --inplace --protect-args host:"${f}" "${f}"
# getfattr -d "${f}"
getfattr: Removing leading '/' from absolute path names
# file: /path/UPS.txt
user.rsync.%stat="100650 0,0 43874288:43874288"
# ino=$(stat -c %i "${f}"); echo ino=$ino
ino=24577
# dbufstat.py -bx | egrep -w "objset|${ino}"
pool objset object level blkid offset dbsize meta state dbholds list atype index flags count asize access mru gmru mfu gmfu l2 l2_dattr l2_asize l2_comp aholds dtype btype data_bs meta_bs bsize lvls dholds blocks dsize
pool 13879 24577 0 0 0 512 0 4 0 1 0 -1 0x260 1 512 4298179477 0 0 0 0 0 16270848 512 ZIO_COMPRESS_OFF 0 DMU_OT_PLAIN_FILE_CONTENTS DMU_OT_SA 512 16K 208 1 2 1 512
# time zdb -dddddd pool/filesystem ${ino}
Dataset filesystem [ZPL], ID 13879, cr_txg 476433, 13.5G, 22432 objects, rootbp DVA[0]=<5:64000131800:800> DVA[1]=<0:66000131000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=488220L/488220P fill=22432 cksum=16d458dc08:76bbfba9d76:14a5aedb85836:28a1281bede5b2
Object lvl iblk dblk dsize lsize %full type
24577 1 16K 512 1.50K 512 100.00 ZFS plain file (K=inherit) (Z=inherit)
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /path/UPS.txt
uid 1079
gid 1000
atime Thu Dec 19 02:31:48 2013
mtime Thu Jan 20 10:30:44 2011
ctime Thu Dec 19 02:31:48 2013
crtime Thu Dec 19 02:31:48 2013
gen 488220
mode 100650
size 52
parent 61
links 1
pflags 40800000004
dump_znode_sa_xattr: sa_xattr_size=24 sa_size error=0
SA packed dump sa_xattr_size=24: \001\001\000\000\000\000\000\000\000\000\000\001\000\000\000\100\000\000\000\110\000\000\000\020
SA xattr dump:
dump_znode_sa_xattr: nvlist_unpack error=14
Indirect blocks:
0 L0 5:6400012c800:800 200L/200P F=1 B=488220/488220
segment [0000000000000000, 0000000000000200) size 512
# zfs unmount pool/filesystem
# zfs mount pool/filesystem
# getfattr -d "${f}"
getfattr: /path/UPS.txt: Bad address
# ino=$(stat -c %i "${f}"); echo ino=$ino
ino=24577
# zdb -dddddd pool/filesystem ${ino}
Dataset pool/filesystem [ZPL], ID 13879, cr_txg 476433, 13.5G, 22432 objects, rootbp DVA[0]=<5:64000131800:800> DVA[1]=<0:66000131000:800> [L0 DMU objset] fletcher4 lzjb LE contiguous unique double size=800L/200P birth=488220L/488220P fill=22432 cksum=16d458dc08:76bbfba9d76:14a5aedb85836:28a1281bede5b2
Object lvl iblk dblk dsize lsize %full type
24577 1 16K 512 1.50K 512 100.00 ZFS plain file (K=inherit) (Z=inherit)
208 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /path/UPS.txt
uid 1079
gid 1000
atime Thu Dec 19 02:31:48 2013
mtime Thu Jan 20 10:30:44 2011
ctime Thu Dec 19 02:31:48 2013
crtime Thu Dec 19 02:31:48 2013
gen 488220
mode 100650
size 52
parent 61
links 1
pflags 40800000004
dump_znode_sa_xattr: sa_xattr_size=24 sa_size error=0
SA packed dump sa_xattr_size=24: \001\001\000\000\000\000\000\000\000\000\000\001\000\000\000\100\000\000\000\110\000\000\000\020
SA xattr dump:
dump_znode_sa_xattr: nvlist_unpack error=14
Indirect blocks:
0 L0 5:6400012c800:800 200L/200P F=1 B=488220/488220
segment [0000000000000000, 0000000000000200) size 512
# dbufstat.py -bx | egrep -w "objset|${ino}"
pool objset object level blkid offset dbsize meta state dbholds list atype index flags count asize access mru gmru mfu gmfu l2 l2_dattr l2_asize l2_comp aholds dtype btype data_bs meta_bs bsize lvls dholds blocks dsize
#
@chrisrd Thanks for the detailed information. I've made a trivial reproducer and am looking into it now.
@chrisrd I found the cause of the problem: The xattr SA is clobbered when a mode change occurs that requires additional (or, possibly fewer) DACL entries. Unfortunately, the stock zdb program doesn't show them but my WIP enhanced version does (at least show the count).
A mode 0644 file looks like this:
11 1 16K 512 512 512 100.00 ZFS plain file
168 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /blah644
uid 0
gid 0
atime Wed Dec 18 11:23:10 2013
mtime Wed Dec 18 11:23:10 2013
ctime Wed Dec 18 11:23:10 2013
crtime Wed Dec 18 11:23:10 2013
gen 61547
mode 100644
size 5
parent 4
links 1
pflags 40800000004
ndacl 3
but changing the mode to 606 requires an additional 2 DACL entries:
Object lvl iblk dblk dsize lsize %full type
12 1 16K 512 512 512 100.00 ZFS plain file
184 bonus System attributes
dnode flags: USED_BYTES USERUSED_ACCOUNTED
dnode maxblkid: 0
path /blah606
uid 0
gid 0
atime Wed Dec 18 11:23:27 2013
mtime Wed Dec 18 11:23:27 2013
ctime Wed Dec 18 11:23:27 2013
crtime Wed Dec 18 11:23:27 2013
gen 61551
mode 100606
size 5
parent 4
links 1
pflags 40800000004
ndacl 5
which is why the bonus increases to 184 from 168.
I've not yet figured out why this breaks the xattr but it's most certainly the cause.
EDIT: This is likely the cause of some of the other issues related to a corrupted bonus buffer when SA xattrs are in use.
I've got a fix for this. Pull request to come shortly.
@dweeezil Fantasic! I can stop trying to find my way through the twisty little passages of the zfs SA code and on-disk layout. For what it's worth, a simple reproducer I've been using, with your zdb patch dweeezil/zfs@803578e, is:
zfs=pool/fs
f=/${zfs}/foo
rm $f
touch $f
setfattr -n 'user.rsync.%stat' -v '100650 0,0 43874288:43874288' $f
zdb -dddddd ${zfs} $(stat -c %i $f) ### before corruption
chmod 0650 $f
zdb -dddddd ${zfs} $(stat -c %i $f) ### after corruption
@chrisrd @dweeezil Thanks for getting to the root of this issue! I'm happy to have another variable length SA issue put to rest.
@behlendorf I'm kinda hoping that might be the last of them!
@chrisrd Me too.
Unfortunately it seems there's still something wrong with xattr=sa :-(
I had given up on using xattrs at all for a while, but with the recent commits to resolve #1890 and #1648, I was optimistic that xattr=sa would be trouble free. However I'm running into #503 again.
On linux-3.10.22 w/ zfsonlinux/zfs@c1ab64d3931cbab45fbb197588cb27cb6fd10c33 + cherry-picked nedbass/zfs@915aa8eda7158c69a7cd0f5621e7707eb6deafd2 nedbass/zfs@00edcdc31a6297cd4255750a10ab4e6d54ef775f
(created before these commits made it into master), I'm getting:
All the xattrs on the filesystem are the result of
rsync --fake-super
, which means each file and directory has a single xattr which looks likeuser.rsync.%stat="100666 0,0 65534:65534"
.The filesystem was created after installing the above ZoL, and with xattr=sa set at create. There are no symlinks in the filesystem, as shown by:
Per #503, I used this stap script:
...plus this patch ('cos I couldn't get stap to dump the info itself):
And kern.log has:
The filesystem was all written in a single rsync session, and has ended up with 713474 files and 109375 directories each with a single xattr per above, and 88 files (no dirs) with corrupt xattrs. I have other filesystems created exactly the same way which have all good xattrs.
86 of the 88 bad files have
size=64
per the above debug message, 2 of them havesize=56
.I'll put the filesystem aside for the moment in case there's anything else useful to be dug out of it.