liuis / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

PosixMmapFile which using ftruncate+mmap may crash on Append() while the disk have no enough space #144

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
* What steps will reproduce the problem?
1. Write to a leveldb DB on a almost full disk.
2. Continue Writing to DB until there is no space for *.log file to append to, 
or no space for a *.sst to compact.

* What is the expected output? What do you see instead?
EXPECTED: DB::Wirte()/DB::Put() interface should return Status::IOError() with 
message like "No space left on device".
But the process crashed, with SIGBUS signal. The backtrace was(on linux):
#0  0x00000036b6936f47 in __memcpy_ssse3 () from /lib64/libc.so.6
#1  0x00007f5f56f70232 in leveldb::(anonymous namespace)::PosixMmapFile::Append 
(this=0x10185a0, data=<value optimized out>) at util/env_posix.cc:227

* What version of the product are you using? On what operating system?
Version: 1.5.0 ( version 1.6 & 1.7 also have this problem as the implementation 
of PosixMmapFile was not changed)
OS: Linux x86_64 2.6.32 (CentOS 6.2), FreeBSD 7.2

* Please provide any additional information below.
We have encountered this problem on our data storage server.

If you want to reproduce this bug, an easy way is to use tmpfs to create a 
small in-memory filesystem, and write the DB to it.

I understand why this happen: When ftruncate() is called, the filesystem will 
not actually allocate blocks for the file. After the file is mmaped to the vm, 
the read/write operation will cause os to load the non-exists blocks. Finally, 
OS treat this operation as a invalid memory access.

I'm wondering is there a good way to avoid this bug when using ftruncate+mmap.

Original issue reported on code.google.com by xjason...@gmail.com on 5 Feb 2013 at 6:21

GoogleCodeExporter commented 9 years ago
Can't we use http://man7.org/linux/man-pages/man2/fallocate.2.html on Linux? 
That will fail with ENOSPC. Linux only of course

Original comment by carl.dha...@gmail.com on 15 Apr 2013 at 8:41

GoogleCodeExporter commented 9 years ago
fallocate does not work on ext3. 
http://lists.gnu.org/archive/html/bug-cpio/2010-11/msg00000.html

The solution will have to use posix_fallocate (which is super slow on ext3).

Original comment by abhishek...@gmail.com on 15 Apr 2013 at 9:05

GoogleCodeExporter commented 9 years ago
To reproduce this bug you can use simple program from issue: 
https://code.google.com/p/leveldb/issues/detail?id=219

Original comment by feniksgo...@gmail.com on 3 Dec 2013 at 3:22