Closed nlienard closed 3 years ago
Looks like the kernel runs out of memory and starts killing user-space programs.
fuse-exfat can consume quite a lot of memory depending on the number of files. What happens if you mount your exFAT partition and run find /usb
?
What does exfatfsck /dev/sde2
say after the crash?
it didn't end good
root@NAS:~# exfatfsck /dev/sde2
exfatfsck 1.3.0
Checking file system on /dev/sde2.
WARN: volume was not unmounted cleanly.
File system version 1.0
Sector size 512 bytes
Cluster size 1 MB
Volume size 3726 GB
Used space 3040 GB
Available space 686 GB
Killed
root@NAS:~#
LOG:
Mar 1 13:21:56 localhost kernel: oom_reaper: reaped process 28054 (exfatfsck), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
it is a NAS with 1GB of memory. This is the issue ?
root@NAS:~# free -m
total used free shared buff/cache available
Mem: 1004 158 245 11 600 810
Swap: 486 153 333
I'm going to add 2GB of swap to see if it changes anything....
done.
root@NAS:/# swapon /data/stuff/swapfile
root@NAS:/# swapon /data/stuff/swapfile2
root@NAS:/# free -m
total used free shared buff/cache available
Mem: 1004 163 238 11 602 805
Swap: 2486 153 2333
root@NAS:/#
Running exfatfsck finished this time:
root@NAS:/# exfatfsck /dev/sde2
exfatfsck 1.3.0
Checking file system on /dev/sde2.
WARN: volume was not unmounted cleanly.
File system version 1.0
Sector size 512 bytes
Cluster size 1 MB
Volume size 3726 GB
Used space 3040 GB
Available space 686 GB
Totally 36495 directories and 130311 files.
File system checking finished. No errors found.
root@NAS:/#
and mount works:
root@NAS:/# mount /dev/sde2 /usb
FUSE exfat 1.3.0
root@NAS:/#
but still error:
root@NAS:/# ls -la /usb
ls: cannot access '/usb': Transport endpoint is not connected
root@NAS:/#
Looks like fuse-exfat uses too much memory.
How many files are there on this volume?
Around 130 000
Created a 4 TB volume with 130 000 files, fuse-exfat's RSS grew up to 102 MB. That's x86_64, armv7 binary should consume even less...
sorry, i provided a bad value based on exfatfsck output posted previously.
Checking inode number with df -i gave 3 millions files:
root@NAS:~# df -i /usb
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdc2 3815302 3112993 702309 82% /usb
memory consumation is low (around 99MB) for now:
root@NAS:~# ps auxww |grep [e]xfa
root 20730 2.0 9.5 99416 98276 ? Ss Mar02 31:09 /sbin/mount.exfat /dev/sdc2 /usb -o rw
sorry, i provided a bad value based on exfatfsck output posted previously.
That was the correct value.
Checking inode number with df -i gave 3 millions files
df -i
shows meaningless numbers, see fuse_exfat_statfs()
in the code:
/*
Below are fake values because in exFAT there is
a) no simple way to count files;
b) no such thing as inode;
So here we assume that inode = cluster.
*/
sfs->f_files = le32_to_cpu(ef.sb->cluster_count);
sfs->f_favail = sfs->f_bfree >> ef.sb->spc_bits;
sfs->f_ffree = sfs->f_bavail;
memory consumation is low (around 99MB) for now
I wouldn't say 99 MB is low when there's only 1 GB of system memory in total :)
There are at least two optimizations that can be done in fuse-exfat:
1) Allocate file name strings dynamically. Now they're the part of struct exfat_node
and use 512 bytes each.
2) Discard unused subtrees of struct exfat_node
.
None of those is easy.
Ok, For "df -i" good to know, thanks.
Since i added swap files, i didn't experience the OOM on exfat process.
I also added a protection against OOM killer on exfat process:
root@NAS:~# ps auxw |grep [e]xf
root 20730 1.0 8.4 100504 87124 ? Ss Mar02 31:42 /sbin/mount.exfat /dev/sdc2 /usb -o rw
root@NAS:~# echo -1000 > /proc/20730/oom_score_adj
root@NAS:~#
Hope it is solved !
For the optimizations, will make a try if i got the OOM coming back.
Thanks again
Still good today, 2 full days without issue. looks like it is good. I close the issue for now. thanks for your support.
Hi I got a usb drive of 4TB. When i mount it on a windows, it works fine. When i mount it on Linux, it works but after a while (idling or with activity) , i got an issue, the drive is not accessible anymore.
fdisk is still seeing it
The only solution is to unplugg the disk, plug it to a windows, make a "repair", and unplug, then plug it to the linux back.
A workardound was to umount it during the night with a crontab and to mount it the morning. (still showing the WARN, but the disk was reachable).
in the log