relan / exfat

Free exFAT file system implementation
GNU General Public License v2.0
789 stars 179 forks source link

[1.2.6] "unable to cleanup a node with 1 references" unlinking a .fuse_hidden file #116

Closed njjewers closed 5 years ago

njjewers commented 5 years ago

I've encountered this crash on an aarch64 GNU/Linux 4.4.38 development board, on an exfat filesystem on an SD card. I believe that it is triggered when (effectively) rm -rf * is invoked in the root directory of the mount, and after the crash all further requests return ENOTCONN until the filesystem is remounted. Examining the core file dumped during the crash revealed the following (I would rather not share the core file itself, as it potentially contains proprietary information):

  Id   Target Id         Frame 
* 1    LWP 388           0x00000000004023e8 in fuse_exfat_unlink (path=<optimized out>) at main.c:247
(gdb) bt
#0  0x0000007fae881b40 in raise () from /home/njewers/Downloads/testfail/exfat/minifs/lib/libc.so.6
#1  0x0000007fae882fc0 in abort () from /home/njewers/Downloads/testfail/exfat/minifs/lib/libc.so.6
#2  0x0000000000403c34 in exfat_bug (format=0x7faea1d000 "", format@entry=0x408658 "unable to cleanup a node with %d references") at log.c:50
#3  0x000000000040582c in exfat_cleanup_node (ef=ef@entry=0x4195c0 <ef>, node=<optimized out>) at node.c:67
#4  0x00000000004023e8 in fuse_exfat_unlink (path=<optimized out>) at main.c:247
#5  0x0000007fae9c14fc in ?? () from /home/njewers/Downloads/testfail/exfat/minifs/usr/lib/libfuse.so.2
#6  0x0000007fe29b45c8 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) f 4
#4  0x00000000004023e8 in fuse_exfat_unlink (path=<optimized out>) at main.c:247
247     return exfat_cleanup_node(&ef, node);
(gdb) p *node
$7 = {parent = 0x0, child = 0x0, next = 0x0, prev = 0x0, references = 1, fptr_index = 1097, fptr_cluster = 128368, entry_offset = 160, start_cluster = 94426, attrib = 32, continuations = 3 '\003', 
  is_contiguous = false, is_cached = false, is_dirty = false, is_unlinked = true, size = 143917056, mtime = 1550104085, atime = 1550103964, name = {{__u16 = 46}, {__u16 = 102}, {__u16 = 117}, {
      __u16 = 115}, {__u16 = 101}, {__u16 = 95}, {__u16 = 104}, {__u16 = 105}, {__u16 = 100}, {__u16 = 100}, {__u16 = 101}, {__u16 = 110}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {
      __u16 = 54}, {__u16 = 49}, {__u16 = 50}, {__u16 = 102}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {__u16 = 48}, {__u16 = 51}, {
      __u16 = 0} <repeats 228 times>}}
(gdb) x/30ch *node->name
Value can't be converted to integer.
(gdb) x/30ch node->name
0x44aad8:   46 '.'  102 'f' 117 'u' 115 's' 101 'e' 95 '_'  104 'h' 105 'i'
0x44aae8:   100 'd' 100 'd' 101 'e' 110 'n' 48 '0'  48 '0'  48 '0'  48 '0'
0x44aaf8:   54 '6'  49 '1'  50 '2'  102 'f' 48 '0'  48 '0'  48 '0'  48 '0'
0x44ab08:   48 '0'  48 '0'  48 '0'  51 '3'  0 '\000'    0 '\000'
(gdb) 

I intend to try a newer version of fuse-exfat and see if that resolves the issue.

relan commented 5 years ago

Is there anything from fuse-exfat in syslog?

What's your FUSE version?

Did you make any modifications to the fuse-exfat code?

Was there any concurrent access to the file system when fuse-exfat crashed?

How many files and directories were there in the file system?

njjewers commented 5 years ago

It looks like the filesystem on the SDcard used during the crash has become corrupted:

/aeryon/data/:
total 260
drwxrwxrwx    1 download download    131072 Dec 31  1969 ./
drwxrwxr-x    5 root     root          4096 Jul  4  2018 ../
drwxrwxrwx    1 download download    131072 Jan 10  2000 .Trash-1000/

/aeryon/data/.Trash-1000:
total 384
drwxrwxrwx    1 download download    131072 Jan 10  2000 ./
drwxrwxrwx    1 download download    131072 Dec 31  1969 ../
drwxrwxrwx    1 download download    131072 Dec 12 09:55 files/

/aeryon/data/.Trash-1000/files:
total 0

In particular, stracing the output of ls in that directory gives:

getdents64(3, 0x4ad3a0, 32768)          = -1 EIO (Input/output error)
close(3)                                = 0

With that in mind you may perhaps want to close this - I'm upgrading to 1.3.0 as mentioned above and I'll see if the new fsck can rescue that card. In the meantime, if you are interested in investigating this:

njjewers commented 5 years ago

Also, do you have a way to readily create a broken filesystem like the above for testing? I'd be interested in having something like the above for testing what I'm working on, hopefully without dding the entire SD card.

relan commented 5 years ago

If I attempt to interact with .Trash-1000/files, I get the following messages: mount.exfat[387]: read 128 bytes instead of 192 bytes

Looks like the directory became truncated. I'm still interested to know how this happened.

Did you use this file system with another exFAT implementation?

Also, do you have a way to readily create a broken filesystem like the above for testing?

Nope. Such tasks are difficult to formalize.

njjewers commented 5 years ago

I don't believe that it was used with any other exfat implementation (it was never removed from this test machine which always used fuse-exfat), but I will try and confirm that on Tuesday.

Could a power cut during writing create a truncated directory like this?

Should fuse-exfat support deleting truncated directories, in your opinion? As-is, I cannot delete this file, attempts to do fail with EIO.

I have yet to run exfatfsck on it, I wanted to do a little more testing about how my (company's) application deals with these sorts of errors, but I'll likely do so on Tuesday.

relan commented 5 years ago

Could a power cut during writing create a truncated directory like this?

exFAT doesn't support journalling or transactions (TexFAT is a different story), so power outage during write can break FS.

Should fuse-exfat support deleting truncated directories, in your opinion?

FS drivers usually refuse to work with corrupt file systems leaving them to specialized utilities (fsck). This separation makes sense for exFAT too IMHO.

njjewers commented 5 years ago

That seems fair. I believe that this was due to the filesystem being corrupted due to being uncleanly unmounted, so it's not an issue with fuse-exfat. fsck.exfat 1.3.0 can detect the truncated directory, but cannot automatically fix it:

ERROR: exfatfsck 1.3.0
Checking file system on /dev/mmcblk1p1.
File system version           1.0
Sector size                 512 bytes
Cluster size                128 KB
Volume size                  59 GB
Used space                   21 GB
Available space              38 GB
read 128 bytes instead of 192 bytes.
Totally 2 directories and 0 files.
File system checking finished. ERRORS FOUND: 1, FIXED: 0.

Otherwise I don't believe there's anything actionable for the fuse-exfat to do with this, so I'll close it. Thank you for your help.

relan commented 5 years ago

OK, thanks for confirming.