Open ranweiler opened 3 years ago
@ranweiler, is this still an issue?
@mgreisen, for context, this is a Linux-only edge case that is currently mitigated by retries, but it still valid.
For OneFuzz to hit this, for a single input, a (Linux) task would have to repeatedly have to have its target tracee killed by an external process while it is in a ptrace-stop state. In the context of OneFuzz, this should never happen.
The improvement we can make is, when wait()
-ing on ptrace stops here, check to see if the error variant is TraceeDied
, and return Ok(())
if so. Otherwise, propagate the error.
I'll re-assign this to myself to implement that change.
The other half of this is specific to the OneFuzz task worker, and has been split out in #2926.
When recording coverage or input testing on Linux, we must always be prepared for any tracing operation to fail due to an unreported or not-yet-reported tracee task exit. Right now, we sometimes treat acceptable tracing errors as hard failures, or treat unacceptable task results as successful.
Current examples:
The call to
Ptracer::wait()
should change to look like the second case, because it internally may invokeptrace(2)
and fail withESRCH
due to tracee exit (it does more than justwait(2)
).However, in all the tasks which trace targets, we must take care to to identify cases where tracing failed logically, but the tracing functions did not literally return an error. This can be checked heuristically by e.g. ensuring recorded coverage is nonzero, ensuring that at least one tracee process was created / some syscalls were invoked, &c.
Note that
pete
now returns aTraceeDied
variant in the exact cases where we only want to warn on error, then continue and validate that task results were nontrivial.AB#35975