Closed ramosian-glider closed 2 months ago
0x40086602
is FS_IOC_SETFLAGS
:
sys/linux/fs_ioctl.txt.const:FS_IOC_SETFLAGS = 1074292226, 386:arm:1074030082, mips64le:ppc64le:2148034050
Related to #477 and https://github.com/dvyukov/syzkaller/commit/cd4daf14452ee65caaf268f92e3242e338647533
0x5cfe
maps to these attributes:
#define FS_UNRM_FL 0x00000002 /* Undelete */
#define FS_COMPR_FL 0x00000004 /* Compress file */
#define FS_SYNC_FL 0x00000008 /* Synchronous updates */
#define FS_IMMUTABLE_FL 0x00000010 /* Immutable file */
#define FS_APPEND_FL 0x00000020 /* writes to file may only append */
#define FS_NODUMP_FL 0x00000040 /* do not dump file */
#define FS_NOATIME_FL 0x00000080 /* do not update atime */
#define FS_NOCOMP_FL 0x00000400 /* Don't compress */
#define FS_ENCRYPT_FL 0x00000800 /* Encrypted file */
#define FS_BTREE_FL 0x00001000 /* btree format dir */
#define FS_INDEX_FL 0x00001000 /* hash-indexed directory */
#define FS_JOURNAL_DATA_FL 0x00004000 /* Reserved for ext3 */
Yeah, sorry, I mistyped the name. It is FS_IOC_SETFLAGS
that is harmful.
As far as I understand, it's still possible for the fuzzer to accidentally random ioctl constants that don't match the discrimination.
Shall I handle it in Neutralize()
in sys/targets/common.go
?
Neutralizing it is the last resort, since then we won't ever test it on any filesystem. And it seems to be doing lots of complex things that are fs specific.
We can try to attack this from different angles:
/
(which is shouldn't)We already prohibit a bunch of ioctls for similar reasons (FIFREEZE, EXT4_IOC_SHUTDOWN, EXT4_IOC_RESIZE_FS), so if we hide root fs, we could re-enable these as well.
I am trying to understand how it gets access to /
, but I can't reproduce it, I always get:
#0 [1304ms] -> ioctl$KVM_SET_DEVICE_ATTR(0xffffffffffffffff, 0x40086602, 0x20000040)
#0 [1304ms] <- ioctl$KVM_SET_DEVICE_ATTR=0xffffffffffffffff errno=9
Is it reliably reproducible for you? If so, please run with -debug flag to see what fd it gets.
Neutralizing it is the last resort, since then we won't ever test it on any filesystem. And it seems to be doing lots of complex things that are fs specific.
I was going to do something along the lines of:
+ if c.Args[1].(*prog.ConstArg).Val == arch.FS_IOC_SETFLAGS &&
+ c.Meta.Name != "ioctl$FS_IOC_SETFLAGS" {
+ c.Args[1].(*prog.ConstArg).Val = arch.FS_IOC_GETFLAGS
+ }
, i.e. only neutralize if the ioctl number doesn't match its discrimination.
This won't solve the problem of generating a perfectly valid program doing ioctl$FS_IOC_SETFLAGS()
though.
Is it reliably reproducible for you? If so, please run with -debug flag to see what fd it gets.
Yeah, I'm on it.
FWIW there's also a similar problem with FS_IOC_FSSETXATTR
This won't solve the problem of generating a perfectly valid program doing ioctl$FS_IOC_SETFLAGS() though.
Exactly, so it won't solve the problem. It's even easier for the fuzzer to do the same using ioctl$FS_IOC_SETFLAGS
.
Here's the log from my VM. I am trying to reduce the repro now, but it is extremely fragile. I suppose it also depends on one of the FDs opened by the executor.
Fun. This indeed works:
#include <fcntl.h>
#include <unistd.h>
int main() {
int fds[2];
pipe(fds);
openat(fds[1], "/dev/kvm", 0, 0);
sleep(1000);
}
What is this fd?
$ cat /proc/890477/fdinfo/5
pos: 0
flags: 0100000
mnt_id: 25
ino: 724
It looks like it's just "/dev/kvm".
But in today's episode of Fun With Fuzzers:
0x5cfe
returns EINVAL for this ioctl, but it looks like it passed memory mmaped from kvm vcpu to open openat and the ioctl :)
So we don't know the actual flags value, nor the opened file name. If the mmaped vcpu happened to have /\x00
at that location, then it could open root.
#0 [60126ms] -> mmap$KVM_VCPU(0x20000000, 0x930, 0x1000001, 0x11, 0x5, 0x0)
#0 [60137ms] <- mmap$KVM_VCPU=0x20000000
SIGSEGV on 0x20000040, skipping
#0 [60149ms] -> openat$kvm(0x0, 0x20000040, 0x0, 0x0)
#0 [60157ms] <- openat$kvm=0x6
...
#0 [60269ms] -> ioctl$KVM_SET_DEVICE_ATTR(0x6, 0x40086602, 0x20000040)
The mmaped memory most likely has 0s (at least among values that can possibly succeed when being using as a file name in openat), but opening an empty string from pipe dir fails, so that wasn't that :)
pipe2([3, 4], 0) = 0
openat(4, "", O_RDONLY) = -1 ENOENT (No such file or directory)
Note that mmap() calls are requesting 0x930 bytes, which the executor probably rounds up, but prog2c happily leaves as is (trying a C repro now)
Running a slightly modified C repro under strace:
...
mmap(0x1ffff000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x1ffff000
mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
mmap(0x21000000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x21000000
write(1, "executing program\n", 18executing program
) = 18
openat(0, "/dev/kvm", O_RDONLY) = 3
ioctl(3, KVM_CREATE_VM, 0) = 4
ioctl(4, KVM_CREATE_VCPU, 0) = 5
mmap(0x20009000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC|PROT_SEM|PROT_GROWSUP|0x800000, MAP_SHARED|MAP_FIXED, 5, 0) = 0x20009000
mmap(0x20000000, 4096, PROT_READ|PROT_GROWSDOWN, MAP_SHARED|MAP_FIXED, 5, 0) = 0x20000000
newfstatat(1, "", {st_mode=S_IFCHR|0600, st_rdev=makedev(0xcc, 0x40), ...}, AT_EMPTY_PATH) = 0
ioctl(1, TCGETS, {c_iflag=BRKINT|ICRNL|IXON|IMAXBEL, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|OPOST|ONLCR, c_cflag=B38400|CS8|CREAD|HUPCL|CLOCAL, c_lflag=ISIG|ICANON|ECHO|ECHOE|ECHOK|IEXTEN|ECHOCTL|ECHOKE, ...}) = 0
write(1, "path: '/' (0x1000000002f)\n", 26path: '/' (0x1000000002f)
) = 26
openat(AT_FDCWD, "/", O_RDONLY) = 6
mmap(0x20000000, 4096, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_ANONYMOUS|MAP_POPULATE|MAP_NONBLOCK|MAP_EXECUTABLE|MAP_HUGETLB|0x16002c0|25<<MAP_HUGE_SHIFT, -1, 0) = -1 ENOMEM (Cannot allocate memory)
mmap(0x20001000, 4096, PROT_READ|PROT_WRITE|PROT_GROWSUP, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_GROWSDOWN|MAP_DENYWRITE|MAP_STACK|MAP_FIXED_NOREPLACE|1<<MAP_HUGE_SHIFT, -1, 0) = 0x20001000
ioctl(-1, KVM_IRQFD, 0x20000140) = -1 EBADF (Bad file descriptor)
openat(AT_FDCWD, "/dev/kvm", O_RDONLY) = 7
ioctl(7, KVM_CREATE_VM, 0) = 8
ioctl(8, KVM_CREATE_DEVICE, 0x20000140) = 0
ioctl(6, FS_IOC_SETFLAGS, [FS_UNRM_FL|FS_COMPR_FL|FS_SYNC_FL|FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL|FS_NOATIME_FL|FS_NOCOMP_FL|FS_ENCRYPT_FL|FS_INDEX_FL|FS_JOURNAL_DATA_FL]) = 0
exit_group(0) = ?
+++ exited with 0 +++
so the mapped CPU data contains 0x1000000002f
, which is indeed treated as "/\0".
Essentially we're doing the following:
struct kvm_run
.struct kvm_run
at address 0x20000000
with PROT_READ
, so that it overlaps with syzlang's heap."/dev/kvm"
at address 0x20000040
and fail, because the memory is protected.openat(AT_FDCWD, "/", O_RDONLY)
instead of opening /dev/kvm
, because struct kvm_run
has 0x1000000002f
at address 0x20000040
.0x20000000
with PROT_WRITE
, so that program arguments can be stored in it again.0x20000140
, placing the totally random value 6
at address 0x20000144
. This value coincidentally happens to be the value returned by openat()
at step 4.ioctl(-1, KVM_IRQFD, 0x20000140)
, which does nothing interesting (we only wanted to write 6
to address 0x20000144
).struct kvm_device
at address 0x20000140
, so that the output kvm_device.fd
parameter is located at address 0x20000144
.ioctl(KVM_CREATE_DEVICE)
and pass it to ioctl(FS_IOC_SETFLAGS)
, breaking the file system.so the mapped CPU data contains 0x1000000002f, which is indeed treated as "/\0".
This is hilarious.
I don't think we can prevent all such cases, fuzzer came with other creative ways to break attempts to restrict it. It's more reliable to prevent these things on sandbox level. I think we should mount tmpfs as rootfs for sandbox=none and then re-enable all prohibited syscalls.
We could also do a couple more things to reduce the risk of such cases.
First, it makes little sense to call discriminated versions of openat() with random string parameters, so maybe we need to allocate them in a region that can't be mprotect()ed or mmap()ed by the program.
Second, we'd better not reuse addresses for structs allocated for different syscalls. Given the typical program size, we can afford placing them at a new address every time. In return we'll have more reproducible results that are easier to minimize (calls do not implicitly depend on each other). An alternative would be to explicitly zero out a struct before initializing its fields.
All of this will also reduce fuzzer's ability to find bugs (passing device mmaped memory as a file name looks like a good corner case to test), so it's a tough choice.
Does having overlapping structs really help much? To me it looks more like a loophole that allows the fuzzer to work around resource type checks by writing something into one struct as an int and then reading it back as an fd.
I think we try to avoid overlapping structs as much as possible, where it's reasonably easy to do.
My current intent is to factor out the ./syz-tmp
bits from namespace_sandbox_proc()
and make sure every sandbox uses them.
My current intent is to factor out the
./syz-tmp
bits fromnamespace_sandbox_proc()
and make sure every sandbox uses them.
This would be good for sandbox=none, but I am not sure about sandbox=setuid. sandbox=none can mount own filesystems of all types, so it can test them. sandbox=setuid can't so it will test just this tmpfs, and won't test the actual rootfs at all. But this looks important -- the reachable surface is already pretty small under sandbox=setuid, and executing open/read/write/ftruncate on the actual rootfs (e.g. ext4) is a significant part of that surface. If we do tmpfs in this case, we won't test what an unpriv user can do with the root fs. But sandbox=setuid also totally shouldn't be able to change rootfs attrs (if it manages to get same failures woth broken footfs, we already found a very serious bug).
Hmm, some tests start failing when I am trying to mount tmpfs in do_sandbox_none()
. I suppose this is because we don't run the tests under root.
I could do unshare(CLONE_NEWUSER) to pretend I am root, but I doubt I can elevate the capabilities for mounting tmpfs in the tests.
I could as well skip the tmpfs part for non-root users, but it will remain untested then, which is undesirable.
Oh wait, there's CAP_LINUX_IMMUTABLE
that we can drop in order to prevent setting FS_IMMUTABLE_FL
and FS_APPEND_FL
. This won't stop the fuzzer from messing up with other attributes, but at least it won't break the basic functionality like file/directory creation.
Yet another angle to tackle this (e.g. to make less harmful attributes do not persist after an executor run) would be to do chattr -R = /
on executor startup.
I still think we need to drop CAP_LINUX_IMMUTABLE
to make sure temporary files are deleted.
Yet another angle to tackle this (e.g. to make less harmful attributes do not persist after an executor run) would be to do chattr -R = / on executor startup.
What executor process do you mean? There are usually 4 hierarchical processes now.
I still think we need to drop CAP_LINUX_IMMUTABLE to make sure temporary files are deleted.
You mean entirely and don't test it on any filesystems?
What executor process do you mean? There are usually 4 hierarchical processes now.
I was going to suggest doing that around makeCommand() in pkg/ipc/ipc.go, but apparently it is gone now :)
We need to reset the attributes before preparing to every program execution, what would be the best place to do so?
You mean entirely and don't test it on any filesystems?
That was my intent, right. Given that we cannot reliably mount tmpfs (see above), this might be the lesser evil. It will still be possible to set other attributes, except for FS_IMMUTABLE_FL
and FS_APPEND_FL
.
We need to reset the attributes before preparing to every program execution, what would be the best place to do so?
This looks expensive. We would need to read every dir on disk and read attributes of every file? While a test can take just 40 ms.
Given that we cannot reliably mount tmpfs (see above)
Do you mean this one?
Hmm, some tests start failing when I am trying to mount tmpfs in do_sandbox_none(). I suppose this is because we don't run the tests under root.
What tests have failed? Were they for linux, or test OS? For which OSes have you changed sandbox setup?
What tests have failed? Were they for linux, or test OS? For which OSes have you changed sandbox setup?
I don't see where we run test programs for Linux OS in tests. We build executor for Linux in pkg/runtest/executor_test.go, but there we only run executor unit-tests "syz-executor test", it shouldn't do sandbox setup.
I think there were failing tests in pkg/ipc on Linux, but those are gone now. Let me update my setup, I am pretty sure even the original bug is going to manifest in a different way now (if at all)
If mount was failing, changing attribute of every file on the disk will fail too, right. Both things are not something that tests should be doing.
But I had memories that mounting tmpfs in unpriv operation nowadays.
As far as I understand, mounting tmpfs is not enough. We want to chroot into it, after which we'll also need to mount /dev and procfs under that new root.
Right. Then the question is what test runs executor for Linux OS.
The following program:
breaks fuzzing in a subtle way, setting a number of root filesystems attributes that prevent syz-executor from creating and deleting files:
This happens because the last call,
ioctl$KVM_SET_DEVICE_ATTR(r6, 0x40086602, &(0x7f0000000040)={0x5cfe, 0x0, 0x0, 0x0})
, has an incorrect ioctl number corresponding toFS_IOC_GETFLAGS
, and because it happens to be called on a file descriptor somehow pointing at the root filesystem.The latter part is not clear to me yet. The fd in question has a value of 6, and is supposed to be returned by KVM_CREATE_DEVICE, but could as well be some fd created by the executor.
I'll dig into this, but anyway I believe we need to disallow mutating ioctl numbers to value 0x40086602, unless that's
FS_IOC_GETFLAGS
.