Open euloh opened 3 years ago
The problem is that we do not properly handle dropped perf events, i.e. records that could not be recorded in the perf event output buffer because there not being space available. Release 2.0.0-1.13 will include drop counter support which includes a patch to resolve this issue.
When dt_consume_one() encounters an unrecognized hdr->type -- for example, PERF_RECORD_LOST -- it returns DTRACE_WORKSTATUS_ERROR without running dt_set_errno(dtp, errno). Then, dt_consume_cpu() passes DTRACE_WORKSTATUS_ERROR up the call stack through dtrace_consume() and dtrace_work(). In cmd/dtrace.c in main(), the dtrace_work() error return value leads to the dfatal("processing aborted") call. The unset dtp->dt_errno adds the "Success" string. The result is that processing is aborted with only the puzzling explanation that the status is "Success."
Runs of the test suite can produce occasional such failures, which are typically not reproducible upon retrial.
Here is one way to demonstrate the problem:
dtrace -n 'ksys_write:entry { trace(1); }'
with output directed to the terminal (albeit possibly within a 'script' session to capture output). On my VM, even as many as ten million lines of output might be shown over the course of even a minute, but then the run aborts with the telltale message.