universe-proton / universe-topology

A universal computer knowledge topology for all the programmers worldwide.
Apache License 2.0
50 stars 0 forks source link

Why command rm doesn't reclaim disk space in Linux? #9

Open justdoit0823 opened 7 years ago

justdoit0823 commented 7 years ago

Sometimes, the available disk space doesn't increase after we removed a file with command rm in linux system, and we can't find such file in system. I will do a test on a linode linux machine as the follwoing:

Before creating a file:

root@localhost:~# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        20G  3.1G   16G  17% /

create a file test_rm.log:

root@localhost:~# for i in `seq 1 100`
> do
> cat /usr/bin/python2.7 >> test_rm.log
> done
root@localhost:~# ls -lsh test_rm.log
361M -rw-r--r-- 1 root root 361M Aug 21 06:43 test_rm.log
root@localhost:~# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        20G  3.4G   16G  19% /

The used disk space has increased. But when I removed the file with command rm, the available disk space didn't increase.

root@localhost:~# rm test_rm.log
root@localhost:~# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        20G  3.4G   16G  19% /

Why? Are there any other processes holding the deleted file? With command lsof | grep '(deleted)', I get the process.

root@localhost:~# lsof | grep '(deleted)'
python     2583                   root   12r      REG                8,0 378525600        704 /root/test_rm.log (deleted)
python     2583  2584             root   12r      REG                8,0 378525600        704 /root/test_rm.log (deleted)
python     2583  2643             root   12r      REG                8,0 378525600        704 /root/test_rm.log (deleted)
root@localhost:~# ps aux|grep 2583
root      2583  0.3  3.6 300872 36936 pts/1    Sl+  06:45   0:00 python -c import sys; sys.argv[0] = '/usr/bin/ipython'; from IPython.terminal.ipapp import launch_new_instance; launch_new_instance()

The ipython process has already opened the file, and the disk space hasn't been reclaimed.

root@localhost:~# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/root        20G  3.1G   16G  17% /

After ipython process exited, the used space decreased as expected. But how command rm really works?

root@localhost:~# touch test.log
root@localhost:~#
root@localhost:~# strace rm test.log
execve("/bin/rm", ["rm", "test.log"], [/* 20 vars */]) = 0
brk(NULL)                               = 0x55ea8db74000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f933c829000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=25115, ...}) = 0
mmap(NULL, 25115, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f933c822000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\5\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1856752, ...}) = 0
mmap(NULL, 3959200, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f933c240000
mprotect(0x7f933c3fd000, 2097152, PROT_NONE) = 0
mmap(0x7f933c5fd000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1bd000) = 0x7f933c5fd000
mmap(0x7f933c603000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f933c603000
close(3)                                = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f933c820000
arch_prctl(ARCH_SET_FS, 0x7f933c820700) = 0
mprotect(0x7f933c5fd000, 16384, PROT_READ) = 0
mprotect(0x55ea8d113000, 4096, PROT_READ) = 0
mprotect(0x7f933c82c000, 4096, PROT_READ) = 0
munmap(0x7f933c822000, 25115)           = 0
brk(NULL)                               = 0x55ea8db74000
brk(0x55ea8db95000)                     = 0x55ea8db95000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2993552, ...}) = 0
mmap(NULL, 2993552, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f933bf65000
close(3)                                = 0
ioctl(0, TCGETS, {B38400 opost isig icanon echo ...}) = 0
newfstatat(AT_FDCWD, "test.log", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0
geteuid()                               = 0
unlinkat(AT_FDCWD, "test.log", 0)       = 0
lseek(0, 0, SEEK_CUR)                   = -1 ESPIPE (Illegal seek)
close(0)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

From the command strace output, we can get the most important syscall unlinkat(operates in the same way as unlink).

unlink() deletes a name from the filesystem. If that name was the last link to a file and no processes have the file open, the file is deleted and the space it was using is made available for reuse.

If the name was the last link to a file but any processes still have the file open, the file will remain in existence until the last file descriptor referring to it is closed.

So, it's clear to see. The test_rm.log is the last link, but also referred by a file descriptor in another process. And it doesn't seem any advantages if the kernel reclaims disk space when unlink is called on the last link. While there are other file descriptors refer to the deleted file, processes may probably continue to write data into the file with their own offsets. At that time, kernel needs to allocate space, and the offset is larger, the space is larger.

If you want to increase disk space by removing files with command rm, you can avoid a lot strange cases by checking whether there are any processes are referring to the file. And find-and-remove-large-files-that-are-open-but-have-been-deleted also gives a good answer.