Open xairy opened 3 months ago
Regarding the Issue 1.
We don't pass the features list to tools/syz-execprog
indeed, and it would be correct to do so, but I think it won't give any noticeable improvement in the case when the reproductions are run by a syz-manager
(even when not on syzbot). The features are enabled unconditionally (if the kernel turns out to support them), so e.g. the slow netdev setup functionality will always be enabled.
Also, during the repro generation, we may only drop the features at a very late stage -- we must already have a program that reliably crashes the kernel, which is the longest part. So most of the iterations would have to happen with all features anyway.
Where it can surely make a difference is if we made tools/syz-repro
accept the set of enabled/disabled features and then someone manually crafts some minimal feature list. Then it can really optimize the process, but it's a very very special use case.
What we can do is to extend the manager config to allow selectively disabling features. There's already the experimental remote_cover
option that disables flatrpc.FeatureExtraCoverage
. And both syz-manager
and syz-repro
take this config as an argument. But we will still need to teach syz-repro
to respect these config options when spawning syz-execprog
.
Do this sound acceptable?
There's a set of issues/behaviors in
pkg/repro
and related packages that together make the reproducing process take unnecessarily longer and also make most repros to have therepeat
flag set.When reproducing a bug,
pkg/repro
first usessyz-execprog
to execute the program and see if it can trigger the bug and also decide on the appropriate timeout.Issue 1. The first (minor) issue is that
pkg/instance/execprog.go
does not respect the enabled features when spawningsyz-execprog
: the tool is always spawned with all features enabled (i.e., no-enable
or-disable
flags are provided).This can likely be easily fixed when
pkg/repro
and thuspkg/instance/execprog.go
are used from the manager, as the information about enabled features is already passed toRunSyzProg
viaopts
.Fixing this when
pkg/repro
is used fromtools/syz-repro
seems to require more work: we would need to teachsyz-repro
to detect enabled features first and then pass those topkg/repro
.Fixing this will likely make no difference to syzbot, as I believe its instances have all/most features enabled anyway. But it would improve the time it takes to generate syz repros for people running custom instances (due to the start-up time of
syz-execprog
, see issue 2). And it also should speed up generating C repros, as the simplification code won't need to go through all of the features but only through the ones that are enabled.But let's assume we run a full-blown instance with all features enabled. This brings us to:
Issue 2.
pkg/repro
does not account for the fact that spawningsyz-execprog
with all features enabled takes a very long time.The feature that takes particularly long to set up is
net_dev
.swap
is also somewhat long.As
syz-execprog
takes a long time to set up, when spawned frompkg/repro
, it rarely gets to executing programs before the first reproducing timeout (3 * cfg.Timeouts.Program
== 15 seconds) is over. Thus,pkg/repro
often switches to the second timeout (20 * cfg.Timeouts.Program
== 1 minute). This happens even for programs that take little time to trigger bugs.As a result, reproducing a bug takes unnecessarily longer.
I noticed this issue when reproducing a bug on my machine, but I suspect syzbot is affected as well (couldn't check the logs due to the issue #5011 should resolve). (On my machine, spawning
syz-execprog
in QEMU with KVM enabled takes ~23 seconds. However, forpkg/repro
, I need to increase the first timeout to around twice of that, probably becausepkg/repro
starts counting time even before the execution ofsyz-execprog
starts.)I think the proper solution here would be to start the timer only after
syz-execprog
starts executing programs. I.e., ignore the time it takes to set up the features. But I'm not sure how difficult it would be to implement this.Considerably speeding up the features set up process should also work if it's possible. But I suspect this won't be a lasting solution, as at some point more features will likely get added.
(I don't know if guilty commit bisection on syzbot takes the same timeout as was used for reproducing, but if so, this issue also makes bisection time out more often.)
Issue 3 (or, arguably, just a consequence of issue 2). As the reproducing process often decides to use the large 1 minute timeout, an attempt to remove the
repeat
from the reproducer always fails on thecheckOpts
check inpkg/repro
.This is what causes most reproducers to have the
repeat
flag set.I initially noticed this on the USB syzbot manager, and it didn't make sense, as I could reproduce most of the bugs without
repeat
. But it appears that this issue affects all syzbot instances.While it's not a problem for the syzbot's intended purpose by itself, it might create confusion for people looking at the reproducers. At least for me, seeing a
repeat
flag set makes me think that the bug is related to some timing/racing issues.Resolving issue 2 with the approach I mentioned should resolve this one as well. If a different approach is taken, this issue needs to be addressed separately.