Basic block hit count - Githubissues

wizche commented 2 years ago

Hello, I am trying to collect coverage for string comparison (strcmp). I expected the basic block performing character comparison to be hit multiple times for partially matching strings. For example, for strcmp("hello", "hell") the str_a[i] == str_b[i] comparison should happen at least four times. But using TinyInst the resulting coverage doesn't contain any duplicate offsets for the character comparing basic block.

Is TinyInst deduplicating basic blocks which were hit multiple times? If yes, can we make tinyinst report basic block hit count? Thanks a lot

ifratric commented 2 years ago

Hi, TinyInst does not currently produce hit counts, it just tells you if the block/edge was hit or not. As it stands, TinyInst takes a different design approach with a benefit that it can tell you very quickly if a run resulted in new coverage or not. Unfortunately that's pretty much incompatible with hit counters, which require coverage collecting to always be enabled for every basic/block edge. Hit counts is something that will be considered as an (optional) feature in a future version, but I don't have a roadmap for it yet. (you are of course welcome to work on adding it if you wish).

I would also like to point out that it's questionable how much having hit counts helps for strcmp() and the like. They might help when you have a simple target with a single strcmp(), but note that as soon as you have multiple strcmp() calls from various places in your target, a single hit count would be calculated for all of them. As soon as the number of matching characters reaches 256 (the hit counters are typically 8-bit) you lose all useful information from subsequent comparisons. A better approach IMHO would be to add instrumentation specifically for strcmp/memcmp functions.

wizche commented 2 years ago

Thanks a lot for the clarification @ifratric, I do think that hit count could be usefull in many situation but I see why is currently not a priority. Maybe I could give it a try...

Just another question, from an instrumented module address (e.g. the crashing exception address in an instrumented module) there is no direct way to go back to the corresponding original module address? What I do is computing the offset from the address to the base of the instrumented code. Then with this offset I look for the closest match in the module->basic_blocks. From the matching tuple I can get the basic block address in the original module and then I can "traspose" the offset to the original BB address. But I guess this can be off depending on how big is the basic block and the instrumentation code... Is there a more precise/easy way?

ifratric commented 2 years ago

Yes, that's currently the "best" way, with addition that, if a crash occurs in instrumented code, the OnCrashed handler prints out the code around the crashing address (it's in hex, but you can decompile it, I like https://defuse.ca/online-x86-assembler.htm#disassembly). You can use that to identify the crashing instruction after already identifying the basic block.

It would also be possible to create a map similar to module->basic_blocks, but for all instructions. This would, of course, be very memory inefficient, but it would be possible to implement an optional flag for testing. A good place to fill such a map would be the InstrumentInstruction handler, https://github.com/googleprojectzero/TinyInst/blob/master/arch/x86/x86_litecov.cpp#L153. It gets the original address instruction being instrumented. The corresponding address in the in the instrumented code can be obtained via GetCurrentInstrumentedAddress(module). Alternately, it can be done in tinyinst.cpp before InstrumentInstruction gets called, https://github.com/googleprojectzero/TinyInst/blob/master/tinyinst.cpp#L556. Note that one instruction in the original code can be rewritten using multiple instructuions in the instrumented code.

ifratric commented 2 years ago

I went ahead and implemented that. See -full_address_map

wizche commented 2 years ago

Wow, thats was quick, thanks @ifratric. I will check that out!

googleprojectzero / TinyInst

Basic block hit count #56