google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.35k stars 1.23k forks source link

pkg/bisect: detect incorrect bisections #1271

Closed dvyukov closed 4 years ago

dvyukov commented 5 years ago

Branched off #1051 We could detect incorrect bisection results based on some patch characteristics. The following checks were proposed:

  1. Comment-only change (proposed by Al Viro).
  2. Documentation-only change (proposed to Eric Biggers).
  3. Change to a non-current arch (e.g. only arch/arm/* changes on an x86 instance).
  4. Change only to syscall.*.tbl files (proposed to Eric Biggers).
  5. Changes to tests only (e.g. pointing to tools/testing/selftests/, filter out all of tools/?)
  6. Merges (proposed to Eric Biggers).

These seem to suggest 2 types of incorrect results:

  1. Completely incorrect results (e.g. bisected to comment-only change due to flaky crash).
  2. Correct results from the strict point of view, but not the commit ultimately introduced the bug (syscall.*.tbl changes, KASAN/LOCKDEP changes that detect more bugs).

Open question: do we want to handle them differently? E.g. a change bisected to syscall.*.tbl change is a relevant bit of information too.

Open question: what to do with merges? Usually we falsely point to merges because of flaky crashes or disabled configs. However, they can as well be the right commit. They can also belong to type 2. But they also can be the ultimately correct commit too (depending on what the actual merge did).

ebiggers commented 5 years ago

If the bisection result is not "certainly incorrect" but rather just "probably incorrect", IMO it needs to undergo human review before being mailed out. Otherwise syzbot will continue spamming too many wrong results, decreasing the percent of correct results that get looked into. So I suggest just not automatically mailing out "probably incorrect" results, but leaving them available on the syzbot dashboard and marked as such.

dvyukov commented 5 years ago

Makes sense, thanks.

dvyukov commented 5 years ago

Detecting comment-only changes is not possible looking at the patch alone. It would require getting full file copy before/after the patch and then doing some analysis. An interesting option is to compare kernel binaries. If binaries are the same before/after the patch, then definitely the patch does not affect anything and is a false result. This should reliably handle all of 1, 2, 3, 5 and more (even lots of no-op code changes like adding parens, swapping variable declarations, etc). It's not this simple because e.g. changes to debug info (changed lines when a comment is added) are not actually relevant. Kbuild also embeds build timestamps into binaries.

dvyukov commented 4 years ago

@zsm-oss FYI I am going to tackle this, just so that we don't step on each other. This is going to be useful for fix bisection too.

dvyukov commented 4 years ago

This should be mostly done. We don't do the "Correct results from the strict point of view, but not the commit ultimately introduced the bug (syscall.*.tbl changes, KASAN/LOCKDEP changes that detect more bugs)" part yet. But it's much less frequent and not as critical as completely bogus results.

dvyukov commented 4 years ago

One case of non-deterministic/non-hermetic kernel build: https://syzkaller.appspot.com/bug?extid=19b469021157c136116a https://groups.google.com/forum/#!msg/syzkaller-bugs/4FU25pFDMAo/1wME8ZdHBQAJ https://syzkaller.appspot.com/x/bisect.txt?x=143090dee00000

Bisecting: 107 revisions left to test after this (roughly 7 steps)
[910cd32e552ea09caa89cdbe328e468979b030dd] parisc: Fix and enable seccomp filter support
testing commit 910cd32e552ea09caa89cdbe328e468979b030dd with gcc (GCC) 5.5.0
kernel signature: 53254f14d716ef4e40b90890cbd5d216ccd0186d
...
# git bisect bad 910cd32e552ea09caa89cdbe328e468979b030dd
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[4f4acc9472e54ce702f1d85fc9e6d57767dec91f] parisc: Fix SIGSYS signals in compat case
testing commit 4f4acc9472e54ce702f1d85fc9e6d57767dec91f with gcc (GCC) 5.5.0
kernel signature: aeb302881aba93101d5445ecd72b75dd79be5e59
all runs: OK
# git bisect good 4f4acc9472e54ce702f1d85fc9e6d57767dec91f
910cd32e552ea09caa89cdbe328e468979b030dd is the first bad commit
commit 910cd32e552ea09caa89cdbe328e468979b030dd
Author: Helge Deller <deller@gmx.de>
Date:   Wed Mar 30 14:14:31 2016 +0200

Trying to reproduce this locally with:

KBUILD_BUILD_VERSION=0 KBUILD_BUILD_TIMESTAMP=now KBUILD_BUILD_USER=syzkaller KBUILD_BUILD_HOST=syzkaller CFLAGS="-w -Wno-error=attributes -Wno-error=restrict" HOSTCFLAGS="-w -Wno-error=attributes -Wno-error=restrict" HOSTCC=/usr/local/google/home/dvyukov/syzkaller/bisect_bin/gcc-5.5.0/bin/gcc CC=/usr/local/google/home/dvyukov/syzkaller/bisect_bin/gcc-5.5.0/bin/gcc make CC=/usr/local/google/home/dvyukov/syzkaller/bisect_bin/gcc-5.5.0/bin/gcc  CFLAGS="-w -Wno-error=attributes -Wno-error=restrict" HOSTCFLAGS="-w -Wno-error=attributes -Wno-error=restrict" KERNELVERSION=syzkaller LOCALVERSION=-syzkaller

building on commits 910cd32e552ea09caa89cdbe328e468979b030dd and 4f4acc9472e54ce702f1d85fc9e6d57767dec91f.

$ /tmp/binsig vmlinux.910cd32e552ea09caa89cdbe328e468979b030dd
.text   21639290    8f65e23ec3380384230719e8c3602630802c9200
__ex_table  15232   ca10750e9b0deae09977e7ab63709abb9b28f154
.rodata 7485623 9887d04cf8c795c067cc1ed295e25312e6659886
__bug_table 128316  16fae33bff376f930fd34bd25ab13b5c00fa7bc7
.pci_fixup  12720   eeface69f50fc3e9f2c5fd6d38fdc8938356807f
.builtin_fw 792 a0a6bc32bc802ef7b6b9b1a6140b805bdc4ac437
.tracedata  120 edae3c071d566e365e4b558e07d185322d62592a
__ksymtab   100208  4dd35545e7372c3a7158a2c58b109ca420e31587
__ksymtab_gpl   87392   29a544238591a311411abf691c5d6cf61e59be44
__ksymtab_strings   231192  f71fb6329ee25a029001c3d41a3480e0b81e5c45
__init_rodata   168 7caf4e2d0fcf8d34310ed119489b68a5ad510e38
__param 32240   de182728331e61beebe02b4e0480568e753d2798
__modver    2600    875523441672ba66ec2f715876ce8cebefcbda07
.data   2270784 7726b563f1dfdefcd6b4207b9251ae419f373343
_kprobe_blacklist   32  49da0faf510a49f2b37bff6d2219fbbf20b6f39e
.vvar   4096    1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
.data..percpu   100376  ad8045b0a9bc62691cddd6956fa1a28c2e2c334c
.init.text  594518  187ae776657c3cbfad91b567642f017d5804d072
.init.data  661216  9acfb550b6427eac3442b07dd882bd68c5983574
.x86_cpu_dev.init   24  95a4a1d1f72bb713f081a2e9dab97a6497a69f2a
.parainstructions   65420   18758b04e79e529e03f1ac0c3c368e5007092a24
.altinstructions    34320   5d56465475bf65d0a4a6ef4facab222880cc071d
.altinstr_replacement   8807    c78c8c3f198ec91caa2b29ed3b5a7996ec6909ee
.iommu_table    240 a65915a1db1134278ee57463cc6f02308e41edd5
.apicdrivers    40  8982fb06574f4932de44fd0070b28f0ca24bf696
.exit.text  31839   3d29137239e081f2a620328c5ad3c97d561346ae
.smp_locks  69632   a856b18c833bcc827f114cbf7d4644d44f1e43e5
.data_nosave    4096    1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
88159ffa1c5f5b380d250a7e5d95608d6ea645a5

$ /tmp/binsig vmlinux.4f4acc9472e54ce702f1d85fc9e6d57767dec91f
.text   21639290    8f65e23ec3380384230719e8c3602630802c9200
__ex_table  15232   ca10750e9b0deae09977e7ab63709abb9b28f154
.rodata 7485623 5377417846e39eb55691e7b9392f4150e8238167
__bug_table 128316  16fae33bff376f930fd34bd25ab13b5c00fa7bc7
.pci_fixup  12720   eeface69f50fc3e9f2c5fd6d38fdc8938356807f
.builtin_fw 792 a0a6bc32bc802ef7b6b9b1a6140b805bdc4ac437
.tracedata  120 edae3c071d566e365e4b558e07d185322d62592a
__ksymtab   100208  4dd35545e7372c3a7158a2c58b109ca420e31587
__ksymtab_gpl   87392   29a544238591a311411abf691c5d6cf61e59be44
__ksymtab_strings   231192  f71fb6329ee25a029001c3d41a3480e0b81e5c45
__init_rodata   168 7caf4e2d0fcf8d34310ed119489b68a5ad510e38
__param 32240   de182728331e61beebe02b4e0480568e753d2798
__modver    2600    875523441672ba66ec2f715876ce8cebefcbda07
.data   2270784 7726b563f1dfdefcd6b4207b9251ae419f373343
_kprobe_blacklist   32  49da0faf510a49f2b37bff6d2219fbbf20b6f39e
.vvar   4096    1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
.data..percpu   100376  ad8045b0a9bc62691cddd6956fa1a28c2e2c334c
.init.text  594518  187ae776657c3cbfad91b567642f017d5804d072
.init.data  661216  f48009c823b40f737b8c2680d125ddfcfd016bfa
.x86_cpu_dev.init   24  95a4a1d1f72bb713f081a2e9dab97a6497a69f2a
.parainstructions   65420   18758b04e79e529e03f1ac0c3c368e5007092a24
.altinstructions    34320   5d56465475bf65d0a4a6ef4facab222880cc071d
.altinstr_replacement   8807    c78c8c3f198ec91caa2b29ed3b5a7996ec6909ee
.iommu_table    240 a65915a1db1134278ee57463cc6f02308e41edd5
.apicdrivers    40  8982fb06574f4932de44fd0070b28f0ca24bf696
.exit.text  31839   3d29137239e081f2a620328c5ad3c97d561346ae
.smp_locks  69632   a856b18c833bcc827f114cbf7d4644d44f1e43e5
.data_nosave    4096    1ceaf73df40e531df3bfb26b4fb7cd95fb7bff1d
df4873e341de435636e894242dd5572aa96ebff5

I see small content diffs in .rodata and .init.data. Actually trying full clean build on just 910cd32e55, I see the same. Is deterministic kernel build a myth? Or we need more magic env vars?....

dvyukov commented 4 years ago

The .rodata diff is:

003C 8640: D6 86 17 83 FF FF FF FF  D8 86 17 83 FF FF FF FF  ........ ........ 
003C 8640: D7 86 17 83 FF FF FF FF  D8 86 17 83 FF FF FF FF  ........ ........

.init.data:

000A 1670: B8 B8 9A 5B 1A B9 A2 89  1B 43 69 43 1C FA 4C 60  ...[.... .CiC..L`
000A 1680: 8C 94 D4 32 06 06 06 03  64 3B 8D C0 94 A1 05 0E  ...2.... d;......
000A 1690: AD 06 86 44 DA 69 8A C6  77 46 B2 53 3F 39 3F AF  ...D.i.. wF.S?9?.
000A 16A0: 38 3F 27 15 DD 6E 63 88  7F 9D 0D A8 EC 5F 98 5B  8?'..nc. ....._.[
000A 16B0: 0C 8A F2 F3 4B 50 EC 24  0C 88 55 87 0E 9C 60 8C  ....KP.$ ..U...`.
000A 16C0: 90 20 47 4F 1F D7 20 45  45 45 06 22 00 00 C2 CD  . GO.. E EE."....
000A 16D0: 46 4A 00 02 00 00 00 00  86 00 00 00 00 00 00 00  FJ...... ........
vs
000A 1670: B8 B8 9A 3B 1A 5B A0 89  1B 43 69 43 1C FA 4C 60  ...;.[.. .CiC..L`
000A 1680: 8C 94 D4 32 06 06 06 03  64 3B 8D C0 94 A1 05 0E  ...2.... d;......
000A 1690: AD 06 86 44 DA 69 8A C6  77 46 B2 53 3F 39 3F AF  ...D.i.. wF.S?9?.
000A 16A0: 38 3F 27 15 DD 6E 63 88  7F 9D 0D A8 EC 5F 98 5B  8?'..nc. ....._.[
000A 16B0: 0C 8A F2 F3 4B 50 EC 24  0C 88 55 87 0E 9C 60 8C  ....KP.$ ..U...`.
000A 16C0: 90 20 47 4F 1F D7 20 45  45 45 06 22 00 00 73 55  . GO.. E EE."..sU
000A 16D0: 87 63 00 02 00 00 00 00  86 00 00 00 00 00 00 00  .c...... ........
vs
000A 1670: B8 B8 9A 3B 1B 19 A1 89  1B 43 69 43 1C FA 4C 60  ...;.... .CiC..L`  
000A 1680: 8C 94 D4 32 06 06 06 03  64 3B C1 66 19 19 5A E0  ...2.... d;.f..Z.  
000A 1690: D0 6A 60 48 A4 9D A6 68  7C 67 24 3B F5 93 F3 F3  .j`H...h |g$;....  
000A 16A0: 8A F3 73 52 D1 ED 36 86  F8 D7 D9 80 CA FE 85 B9  ..sR..6. ........  
000A 16B0: C5 A0 28 3F BF 04 C5 4E  C2 80 58 75 E8 C0 09 C6  ..(?...N ..Xu....  
000A 16C0: 08 09 72 F4 F4 71 0D 52  54 54 64 20 02 00 00 22  ..r..q.R TTd ..."  
000A 16D0: 18 D6 FC 00 02 00 00 00  87 00 00 00 00 00 00 00  ........ ........  
dvyukov commented 4 years ago

The .rodata diff points somewhere into kallsyms_addresses?

  [ 4] .rodata           PROGBITS         ffffffff82600000  01601000
       00000000007238b7  0000000000000000   A       0     0     256

ffffffff828ac968 R kallsyms_addresses
ffffffff829d7608 R kallsyms_num_syms
dvyukov commented 4 years ago

For .init.data I can't figure what it belongs to:

  [20] .init.data        PROGBITS         ffffffff830d7000  0208d000
       00000000000a16e0  0000000000000000  WA       0     0     4096

That's 0xffffffff830d7000 + 0xa1670 = 0xffffffff83178670 which contains some text?

ffffffff83178640 0000000000000008 t __initcall_apparmor_init
ffffffff83178648 0000000000000008 t __initcall_integrity_iintcache_init
ffffffff83178650 T __initramfs_start
ffffffff83178650 t __irf_start
ffffffff83178650 T __security_initcall_end
ffffffff831786d7 t __irf_end
ffffffff831786d8 T __initramfs_size
ffffffff831786e0 0000000000000008 r __cpu_dev_intel_cpu_dev
ffffffff831786e0 R __x86_cpu_dev_start
ffffffff831786e8 0000000000000008 r __cpu_dev_amd_cpu_dev
ffffffff831786f0 0000000000000008 r __cpu_dev_centaur_cpu_dev
ffffffff831786f8 R __parainstructions
ffffffff831786f8 R __x86_cpu_dev_end
ffffffff83188684 R __parainstructions_end
ffffffff83188688 R __alt_instructions
ffffffff83190c98 R __alt_instructions_end
dvyukov commented 4 years ago

Weird case or I don't understand something: https://groups.google.com/d/msg/syzkaller-bugs/8gr_KXp-SYc/O_KcmdN5AQAJ This is bisected to 2de50e9674fc4ca3c6174b04477f69eb26b4ee31 and it has this parent: $ git log -n 1 --format="%P" 2de50e9674fc4ca3c6174b04477f69eb26b4ee31 388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95 But the parent was never tested... how is this possible?

Another similar case: https://syzkaller.appspot.com/bug?extid=4455ca3b3291de891abc Points to 2de50e9674fc4ca3c6174b04477f69eb26b4ee31 $ git log -n 1 --format=%P 2de50e9674fc4ca3c6174b04477f69eb26b4ee31 388f7b1d6e8ca06762e2454d28d6c3c55ad0fe95 which was not tested...

dvyukov commented 4 years ago

Moved the git bisect parent commit issue to #1527 (too much context and info).

dvyukov commented 4 years ago

Another non-deterministic build: https://groups.google.com/g/syzkaller-bugs/c/XEbliACT0zA/m/LIKSSEccCgAJ

hogander-unikie commented 4 years ago

init.data part of this bug could be due to initramfs which is part of vmlinux. if CONFIG_BLK_DEV_INITRD is enabled in configuration and CONFIG_INITRAMFS_SOURCE is not -> small default initramfs is generated. This is in usr/Makefile. For some reason content of this generated initramfs in init.data is varying from build to build. This could be easily get rid of by disabling CONFIG_BLK_DEV_INITRD from syzbot configs?

dvyukov commented 4 years ago

Good find! Indeed I see diff in usr/initramfs_data.cpio after just removing old one and re-running make. We can drop CONFIG_BLK_DEV_INITRD as we don't use initramfs, but wonder if the non-determinism it fixable easily maybe.

dvyukov commented 4 years ago

I guess it's this guy in usr/gen_init_cpio.c

    default_mtime = time(NULL);

which means we can't fix it in the older kernels. I hoped for some env var or something similar.

dvyukov commented 4 years ago

I think we need to remove CONFIG_BLK_DEV_INITRD in pkg/vcs/linux.go:linuxAlterConfigs. This will auto-heal all bisections even with already stored configs with CONFIG_BLK_DEV_INITRD=y. Probably better to drop CONFIG_BLK_DEV_INITRD from current configs as well.

Jouni you previously provided a link to some page called along the lines of "Deterministic Kernel Builds". Can make sense to notify that group about the issue with CONFIG_BLK_DEV_INITRD.

hogander-unikie commented 4 years ago

Jouni you previously provided a link to some page called along the lines of "Deterministic Kernel Builds". Can make sense to notify that group about the issue with CONFIG_BLK_DEV_INITRD.

Yes, I will ping them or update the document by myself.

dvyukov commented 4 years ago

I think we need to remove CONFIG_BLK_DEV_INITRD in pkg/vcs/linux.go:linuxAlterConfigs. ...

@hogander-unikie will you send a PR for this? Want to understand if I need to keep track of this or I can wait for a PR.

hogander-unikie commented 4 years ago

I think we need to remove CONFIG_BLK_DEV_INITRD in pkg/vcs/linux.go:linuxAlterConfigs. ...

@hogander-unikie will you send a PR for this? Want to understand if I need to keep track of this or I can wait for a PR.

Yes, I will prepare pull request for this.