leahneukirchen / extrace

trace exec() calls system-wide
Other
126 stars 9 forks source link

Drops a load of events in pathological tests #3

Open FauxFaux opened 6 years ago

FauxFaux commented 6 years ago

extrace fails to print anything when it is too slow to pick up an event. Under a worst case scenario, it drops 98%+ of events. I don't think this is particularly fixable; it's just a limitation of the way /proc works.

I wrote some codes, which others might be interested in, but I'm out of ideas for: https://github.com/FauxFaux/extrace/commits/dropping

Various aspects discussed below:


Warn that we dropped processes: https://github.com/FauxFaux/extrace/commit/b61406795c085b0debb88a996a9d5eec6358fa5a . I think something like this, perhaps with different messages, should be added.


https://github.com/FauxFaux/extrace/blob/dropping/yes-exec.c (+ Makefile changes, + maybe/ subfolder): A tool which launches a binary as quickly as possible, and a binary which exits incredibly quickly, 11x+ faster than a shell running /bin/true. maybe is ~6x faster than /bin/true here.

% time ./yes-exec 4000 maybe/maybe
... 0.514 seconds total

% time (for i in {0..4000}; do /bin/true; done)
... 5.875 seconds total

https://github.com/FauxFaux/extrace/commit/381105c16d00fa910c221b547d27a1877d56e70f and previous refactors: an attempt to offload some of the work to a different thread via. a blocking queue.

I believe this causes the event-watching "main" thread to be better at picking up the messages (as it spends less time doing the rest of the work), but it doesn't seem to make a big difference in how many times we pick up the name in the maybe case.

I don't think the thread overhead is significant, but Every Nanosecond Counts. It might be possible to replace this with a single-producer-single-consumer queue ("spsc queue") and some other notification mechanism, to slightly increase performance? If it wasn't otherwise obvious, I don't really know what I'm doing.


There's not much wasted time; most of it is opening files, unsurprisingly. The threading moves most of the writes onto the other thread, and there doesn't seem to be much contention on the thread queue (not super well tested):

brandong egg's flamegraph

leahneukirchen commented 6 years ago

Yes, the interface is inherently racy. But I've been using extrace on systems with quite some load, and didn't see much dropping in practice... most processes spawned actually do something. :)

I think warning is good, but adding threads is overengineering.

leahneukirchen commented 6 years ago

Merged as f854665.