google / souper

A superoptimizer for LLVM IR
Apache License 2.0
2.11k stars 167 forks source link

QSynth evaluation #738

Open fvrmatteo opened 4 years ago

fvrmatteo commented 4 years ago

Hi, I was testing (together with @pgarba) the instruction synthesis of some obfuscated expressions obtained from the QSynth repository. As an example this expression has been converted to the Souper IR and souper-check has been used to synthesise it both with enumerative and component-based synthesis. Both methods quickly converge to the correct candidate (mul %1, %1, obtained with the bit-width reduced synthesis using i8/i16) but it seems they get stuck during the Z3 equivalence phase.

The enumerative synthesis opt file is the following:

; RUN: %souper-check -infer-rhs -souper-enumerative-synthesis -souper-enumerative-synthesis-debug-level=4 %solver %s > %t1
; RUN: %FileCheck %s < %t1
; CHECK: mul %1, %1

%0:i64 = var
%1:i64 = var
%2 = xor %1, -1:i64
%3 = or %0, %2
%4 = shl %3, 1:i64
%5 = add %0, -2:i64
%6 = sub %5, %1
%7 = sub %6, %4
%8 = xor %7, %0
%9 = and %8, %1
%10 = or %8, %1
%11 = mul %10, %9
%12 = and %8, %2
%13 = xor %8, -1:i64
%14 = and %1, %13
%15 = mul %12, %14
%16 = add %11, %15
infer %16

The feeling is that the Z3 equivalence phase takes so long just because proving the matching with the original MBA expression is hard. I noticed that other Souper test files are synthesising the mul instruction and the equivalence check is immediate, hence it seems to be related with the complexity of the obfuscated expression.

Is that a correct guess or something can be done to speedup that phase?

regehr commented 4 years ago

Hello, It's great that you're looking at QSynth in Souper! The full-width equivalence proof here is going to be challenging and I don't think we have any great ideas about how to speed it up. Some not-so-great ideas include:

Sorry, wish I had a better answer, but we're mainly compiler people who use solvers, as opposed to being actual solver people, and we basically just rely on the solver to just do the right thing.

fvrmatteo commented 4 years ago

Thanks for the answer!

I tried to specify other solvers while doing the synthesis with Souper (e.g. stp) but it looks like they are rejected and make the synthesis fail with the ; Failed to infer RHS message. Is this expected behaviour?

As of now the reduced-width equivalence is enough for some testing, but the soundness isn't the same. Considering SMT-Solvers are pretty much black magic to me, as I'm a simple user with no advanced knowledge of the internals, I'll probably try to ask the question to the Z3 GitHub repository.