llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.45k stars 11.76k forks source link

libFuzzer fails to load corpus if filesystem does not provide d_type #26365

Open llvmbot opened 8 years ago

llvmbot commented 8 years ago
Bugzilla Link 25991
Version trunk
OS Linux
Blocks llvm/llvm-project#29260
Reporter LLVM Bugzilla Contributor

Extended Description

With latest (trunk) libFuzzer I ran into a problem fuzzing processes on my desktop machine. Running the fuzzer against a corpus directory, it would repeatedly show no units loaded even though it was saving finds to the same directory.

It turned out to be due to this loop in FuzzerIO.cpp

while (auto E = readdir(D)) { if (E->d_type == DT_REG || E->d_type == DT_LNK) V.push_back(E->d_name); }

The Linux man page says d_type is not set for some filesystems, this apparently includes XFS when running over dm-crypt. This caused this loop to appear to load the corpus but actually silently skip all the files provided.

I fixed it locally by adding to this loop (from memory here):

else if(E->d_type == DT_UNKNOWN && strcmp(E->d_name, ".") != 0 && strcmp(E->d_name, "..") != 0) V.push_back(E->d_name);

at which point I could stop and restart my fuzzers and everything seemed to work.

Let me know if there is any additional information I can provide, and thanks for a great piece of software.

sylvestre commented 2 years ago

mentioned in issue llvm/llvm-project#29260

kcc commented 8 years ago

I could submit a patch for stat'ing DT_UNKNOWN as outlined above if that seems good

Yes, let's do that. Please make the code look like

while (auto E = readdir(D)) if (IsRegularFile(E)) V.push_back(E->d_name);

llvmbot commented 8 years ago

Subdirs of the corpus directory hadn't occurred to me. It looks like the current code just ignores subdirs, and always writes to/reads from the top dir. Which makes sense since I assume it's relying on SHA-1 uniquness to sort out the dups, which breaks if you store the corpus in multiple dirs.

Probably the clean thing to do is properly stat the file

BTW my manpage describes DT_LNK as just meaning "a symbolic link" but not necessarily a symbolic link to a plain file

For testing I would propose adding to ListFilesInDir a bool parameter specifying if the d_type should be trusted. If true, then d_type is consulted and DT_REG files added immediately and DT_DIRs skipped. Otherwise (DT_UNKNOWN, DT_LNK, etc) the file is stat'ed. Then the test could try both variation and compare the result which should always be equal on any directory. The only difference is the additional stat overhead.

BTW re portability, I'm not sure if it matters but apparently d_type is only set on recent ext[234] and btrfs, and the man page states. "All applications must properly handle a return of DT_UNKNOWN." Running a program to dump d_types on various filesystems I found even unencrypted XFS in Linux 4.3.3 does not set d_type.

I could submit a patch for stat'ing DT_UNKNOWN as outlined above if that seems good

kcc commented 8 years ago

Hm... Interesting... Your fix makes sense, but does it mean that if the corpus dir contains a subdir, we will try to load it as a file and crash because we can't load it? OTOH, this will happen only if d_type == DT_UNKNOWN, i.e. only on those unusual FSs...

Also, I wonder, if a test for this change that would run on a regular FS is possible?

Would you like to contribute the patch yourself via http://llvm.org/docs/Phabricator.html?

llvmbot commented 8 years ago

assigned to @kcc