ncatlin / rgat

An instruction trace visualisation tool for dynamic program analysis
Apache License 2.0
365 stars 33 forks source link

Consider dropping common blocks #6

Closed ncatlin closed 8 years ago

ncatlin commented 8 years ago

For cases where the code generates more trace data than we can process

Improving loop detection/compression would help if the overhead outside of loops doesn't cause more problems than it solves.

We could also track how many times a block has been sent to the visualiser and stop sending it after so many identical executions.

Example of a Cryptowall 1 loader sample:

cryptowall1badloop

It doesn't do very much before it fills the trace buffers with a tight loop.

If we removed instrumentation from the worst offenders and re-enabled it when different execution moved to different blocks then we are going to get a lot closer to native performance.

Problems with this approach: We are sacrificing edge count, so accurate number on the heatmap are lost. drgat can send a notification that those blocks were too hot to handle though.

Bigger problem: If you remove instrumentation from multiple blocks with call [eax] terminators and one of them breaks the loop, the integrity of the control flow graph is compromised.

A softer approach would be to maintain instrumentation of the blocks but not send their tags to rgat until their target changes. This won't make drgat output much faster but it will stop us having a 500000+ item backlog in rgat.

ncatlin commented 8 years ago

Implemented the softer approach in 0.2 with really impressive results. There is still a reasonable amount of work done for each block (a clean call with a few reads and writes in the best case) which can be reduced in the future but applications constantly presenting fresh code are now far more of a performance problem than simple loops.