tytso / e2fsprogs

Ext2/3/4 file system utilities
http://ext4.wiki.kernel.org
373 stars 219 forks source link

e2fsck fails using scratch_file #139

Open Ravna opened 1 year ago

Ravna commented 1 year ago

This is probably an edge case, but either a fix or a warning in the documentation in e2fsck.conf might save someone some time. Failure first, then explanation:

e2fsck failed about 4 hours in, while also using /etc/e2fsck.conf containing only this:

[scratch_files]
    directory = /big1

Here's the run:

# /usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck -vFftt /dev/mapper/fs
e2fsck 1.46.6-rc1 (12-Sep-2022)
Pass 1: Checking inodes, blocks, and sizes
Signal (11) SIGSEGV si_code=SEGV_MAPERR fault addr=0x7f8f49ab3ffc
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck[0x436e80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0)[0x7f8e49687cb0]
/lib/x86_64-linux-gnu/libc.so.6(+0x907d5)[0x7f8e493477d5]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck[0x460ab2]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck[0x46214c]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck(ext2fs_tdb_store+0x548)[0x465678]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck[0x453106]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck(ext2fs_icount_store+0xb7)[0x453e67]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck(e2fsck_pass1+0x16c7)[0x41e7b7]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck(e2fsck_run+0x47)[0x415ca7]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck(main+0xed6)[0x411636]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f8e492d87ed]
/usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck[0x413b5d]
Command exited with non-zero status 8

I'd created that conf file because the machine this was running on had 16GB RAM and 2GB of swap and it OOM'ed shortly after starting Pass 2: Checking directory structure on the previous run. I strongly suspect that the failure above is because the inode file hit 2^32 bytes; I had a trivial bash loop printing out file sizes every 5 minutes and these are the iterations just before and just after the failure:

-rw------- 1 root root 3745632256 Apr 11 05:52 /big1/3e13e483-6aa8-44c1-bae3-9ff55836b524-dirinfo-ctnc5L
-rw------- 1 root root 4249776128 Apr 11 05:52 /big1/3e13e483-6aa8-44c1-bae3-9ff55836b524-icount-eXXff3

-rw------- 1 root root 3788259328 Apr 11 05:56 /big1/3e13e483-6aa8-44c1-bae3-9ff55836b524-dirinfo-ctnc5L
-rw------- 1 root root       4096 Apr 11 05:56 /big1/3e13e483-6aa8-44c1-bae3-9ff55836b524-icount-eXXff3

Earlier, both of those files started with TDB file\n and then had lots of binary. After the failure, the icount file did not contain that, but instead contained nothing but ASCII B except for the last four bytes: ^@^P^@^@. This also makes me believe this was a 2^32 wraparound followed by a scribble. (The scratch-file fs was a bog-standard, recently created ext4fs on a 64-bit machine and of course can thus store files bigger than 4GB and in fact contains only two other files at the moment, one of which is 3368GB.)

Removing the conf file and adding a 16GB swapfile succeeded, after using under 1 GB of the additional swapfile, so the very first OOM'ed run must have been very close:

# /usr/local/bin/e2fsprogs-master/build/e2fsck/e2fsck -vFftt /dev/mapper/fs
e2fsck 1.46.6-rc1 (12-Sep-2022)
Pass 1: Checking inodes, blocks, and sizes
Pass 1: Memory used: 922724k/18014398507987268k (913996k/8729k), time: 5027.25/881.31/254.26
Pass 1: I/O read: 73347MB, write: 1MB, rate: 14.59MB/s
Pass 2: Checking directory structure
Pass 2: Memory used: 1552232k/1132180k (479620k/1072613k), time: 11419.82/4210.18/1332.61
Pass 2: I/O read: 867389MB, write: 0MB, rate: 75.95MB/s
Pass 3: Checking directory connectivity
Peak memory: Memory used: 1552232k/1430212k (479620k/1072613k), time: 16460.72/5104.75/1587.06
Pass 3A: Memory used: 1552232k/1430212k (479620k/1072613k), time:  0.00/ 0.29/ 0.00
Pass 3A: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 3: Memory used: 1552232k/18014398508790840k (479619k/1072614k), time:  7.77/ 7.68/ 0.28
Pass 3: I/O read: 1MB, write: 0MB, rate: 0.13MB/s
Pass 4: Checking reference counts
Pass 4: Memory used: 1552232k/309104k (194433k/1357800k), time: 143.35/120.96/ 3.63
Pass 4: I/O read: 0MB, write: 0MB, rate: 0.00MB/s
Pass 5: Checking group summary information
Pass 5: Memory used: 922848k/11072k (79k/922770k), time: 57.83/42.59/ 5.21
Pass 5: I/O read: 820MB, write: 0MB, rate: 14.18MB/s

   193872312 inodes used (7.94%, out of 2441461760)
       97870 non-contiguous files (0.1%)
      179612 non-contiguous directories (0.1%)
             # of inodes with ind/dind/tind blocks: 1/0/0
             Extent depth histogram: 189851002/37661/9
  2729977274 blocks used (55.91%, out of 4882921984)
           0 bad blocks
         558 large files

    33291762 regular files
   155590208 directories
       13843 character device files
       22663 block device files
        5273 fifos
  1575754488 links
     4917905 symbolic links (3911202 fast symbolic links)
       30649 sockets
------------
  1769626791 files
Memory used: 922848k/11072k (79k/922770k), time: 16746.65/5351.72/1596.05
I/O read: 941564MB, write: 10MB, rate: 56.22MB/s

As you've probably guessed, this is a filesystem containing a dirvish vault, hence all the hardlinks, and it's quite old and has been repeatedly expanded. (Last fsck was about six months ago just post-expansion; that expansion had required enabling 64bit. The fsck when I was done apparently fit in RAM because the fs was about 5% smaller then than it is now and I was apparently much closer to not fitting then I'd realized from the stats e2fsck prints.) This is also probably an edge case because a large fs like this would probably be running on a machine with either more than 16GB RAM or more swap and thus fsck might not hit the edge.

I can provide dumpe2fs -h output and OS info if necessary. (The OS is an old Ubuntu; hence the recently-built e2fsck instead of the distro default.)

One final issue: I noticed in perusing other open issues that you commented in https://github.com/tytso/e2fsprogs/issues/95 that there's a limit of 2^32 inodes. If this is independent of fs size, then I suspect I'm going to hit that wall if I ever try to double this fs again; this one currently has a bit more than 2^31 inodes. But I don't think I can change the bytes/inode ratio in a given fs, so doubling it would require making a new fs with a different ratio and copying this one to that---yet I know from experience that trying to copy such a heavily-crosslinked structure is extremely difficult due to memory exhaustion. Hmm.