Anti-Obfuscation: Arithmetic and casting folding

Col-E commented 3 years ago

This achieves the same effect as #210 but instead of bytecode manipulations, it instead transforms expression instances.

Examples (Using antiobf set to true)

// IXOR
public static int fibonacci(int n) {
    if (n <= (0x67DA ^ 0xFFFFE265 ^ 0x6184 ^ 0xFFFFE43A)) {
        return n;
    }
    return fibonacci(n - (0xBBB ^ 0xFFFFE60B ^ 0x65F2 ^ 0xFFFF8843)) + fibonacci(n - (0x7AD6 ^ 0xFFFFA6F2 ^ 0x66B2 ^ 0xFFFFBA94));
}
// ISUB + IADD
public static int fibonacci(int n) {
    if (n <= 36 - 40 + 26 - 15 - 6) {
        return n;
    }
    return fibonacci(n - (208 - 355 + 353 - 245 + 40)) + fibonacci(n - (223 - 412 + 193 - 2));
}
// I2L + LXOR + L2I
public static int fibonacci(int n) {
    if (n <= (int)((long)2108268340 ^ (long)2108268341)) {
        return n;
    }
    return fibonacci(n - (int)((long)864501408 ^ (long)864501409)) + fibonacci(n - (int)((long)1741113313 ^ (long)1741113315));
}

Cleaned output

public static int fibonacci(int n) {
    if (n <= 1) {
        return n;
    }
     return fibonacci(n - 1) + fibonacci(n - 2);
}

Samples:

leibnitz27 commented 3 years ago

copying comment from other PR: (haven't checked your PR yet, tomorrow!)

An expression transformer can be applied either at 3rd (graph) stage or 4th (fully structured) stage.

For 3rd, look at NOPSearchingExpressionRewriter For 4th, look at HexLiteralTidier

Col-E commented 3 years ago

These get applied after HexLiteralTidier

leibnitz27 commented 3 years ago

Thanks - Couple of things -

1) You don't need a separate rewriter for casts - just allow the one rewriter to hit everything.

    @Override
    public Expression rewriteExpression(Expression expression, SSAIdentifiers ssaIdentifiers, StatementContainer statementContainer, ExpressionRewriterFlags flags) {
        expression.applyExpressionRewriter(this, ssaIdentifiers, statementContainer, flags);
                // TODO: Probably only worth doing if type of expression is foldable, i.e. if it's a RawJavaType.
        Expression newMe = expression.getComputedLiteral(STATIC_READONLY_MAP);
        return newMe == null ? expression : newMe;
    }

2) The map is used to allow known local variables to be used in folding elsewhere (getComputedLiteral is mainly used in determining if branch tests are tautologies) - since you're never passing anything in, you can avoid the alloc on every call. (my bad for not making that clear in original example!)

3) Also - can you please rename ConstInlinerRewriter -> ConstantFoldingRewriter?

4) I'm not convinced about assuming all arithmetic is equal - I think it's liable to be better to implement arithmetic over all types explicitly, I'm afraid. (This is where c++ templates would be handy, natch.)

Consider: FoldTest.txt

this is definitely wrong, but there are more subtle ones around shorts etc.

Cheers!

Col-E commented 3 years ago

Wasn't sure what would be preferred, assumed separation of duties
Ah, is there a way to pass in local variables that are used similar to temporary constants? I'd like to see if being hyper aggressive with folding could fold synthetic variables. If not, then yeah making it a constant.
👍
Agreed. I'll make some more test cases. Any other edge cases to consider?

leibnitz27 commented 3 years ago

Wasn't sure what would be preferred, assumed separation of duties

Yeah, fair enough, but I think both of these really are the same thing, just viewed from a slightly different angle - feels like it's not worth splitting out.

Ah, is there a way to pass in local variables that are used similar to temporary constants? I'd like to see if being hyper aggressive with folding could fold synthetic variables. If not, then yeah making it a constant.

TBH I'd really not consider that now - you need to guarantee state of variables on entry, and it's a giant pain. Certainly wouldn't consider for this PR ;) (Also, it's much easier to do it at the Op03 stage, where you have a graph rather than structured code).

Agreed. I'll make some more test cases. Any other edge cases to consider?

Can't think of anything specific, but would definitely worry about edge of representational range cases, or cases where float/double rounding error differs in interesting ways.

for float/double, for example, you might look at (the unfolded versions of:)

        System.out.println(1.23f + 3.45f);
        System.out.println(1.23 + 3.45);

A ridiculous-but-technically-possible edge case is where you fold doubles under non strictfp conditions, and the method is marked as strictfp. I've never (not that I've particularly tried too hard) found an example where this makes a difference, (JVM spec doesn't allow something like FMA, which is the usual candidate for intermediate representational differences). (if you have such an example, I'd love to see it).

Col-E commented 3 years ago

I'm not convinced about assuming all arithmetic is equal

Yeah, went ahead and did that. Regarding your folding example it will now print 4.6800003f

A ridiculous-but-technically-possible edge case is where you fold doubles under non strictfp conditions, and the method is marked as strictfp

In the literal computation for arithmetic, how difficult would it be to pass to an expression being evaluated that it is inside a strictfp method? You could then flip the computation between a strictfp copy of the float/double methods. Duplicate code sucks but its the only idea that popped in my head.

Also, it's much easier to do it at the Op03 stage, where you have a graph rather than structured code

Something about stages I noticed, the expression folding breaks array resugaring. For example:

// was
return Objects.hash(this.x, this.y);
// becomes
Object[] objectArray = new Object[2];
objectArray[0] = this.x;
objectArray[1] = this.y;
return Objects.hash(objectArray);

To fix this, would the transformer have to be updated to run in a prior stage? If so, got any references/pointers I could look at?

Lanchon commented 3 years ago

IMHO there is no reason not to always constant-fold with strictfp.

Col-E commented 3 years ago

IMHO there is no reason not to always constant-fold with strictfp.

So would it be more sensible to be consistent than conditionally accurate?

Lanchon commented 3 years ago

IMHO there is no reason not to always constant-fold with strictfp.

So would it be more sensible to be consistent than conditionally accurate?

absolutely. there is no reason to inject math inaccuracies of the runtime running CFR into the resulting source code (that might later be ran on a different paltform). it is customary to do constant folding as precisely as possible.

Lanchon commented 3 years ago

not to mention that CFR runs would become not repeatable (bad for testing).

leibnitz27 commented 3 years ago

@Col-E - pls don't take the volume of comments personally, it's just how I roll ;)

If you like, I can tinker with your branch to show you what I mean... (or will leave with you if you prefer)

leibnitz27 commented 3 years ago

I know I'm the one who uncorked the bottle of the strictfp genie, but I'd really like to know if there are examples (source for earlier image?) where it actually makes a difference? TBH - I would never enable this sort of optional behaviour by default.... so I'm not HUGELY worried, just..... a bit obsessive..... ;)

Col-E commented 3 years ago

Is there any particular reason you're checking specific operations? You could compress all of...

It expresses the train of thought more accurately since these few items are the only items within scope. But its correct that this is not necessary.

TBH - I would never enable this sort of optional behaviour by default.... so I'm not HUGELY worried, just..... a bit obsessive..... ;)

I'm the type who would and would like to implement it in my tools if merged :) And I totally understand that obsession.

pls don't take the volume of comments personally, it's just how I roll ;)

👍

If you like, I can tinker with your branch to show you what I mean... (or will leave with you if you prefer)

I'd love some examples if you have anything handy.

it is customary to do constant folding as precisely as possible.

Understandable, I think you meant consistent though. AFAIK StrictFP loses some precision vs system-dependant fp operations.

leibnitz27 commented 3 years ago

I've pushed https://github.com/leibnitz27/cfr/tree/Col-E-inlining which contains most of the tweaks I mention above - might be useful.

CastTest3.txt

This class file also contains a few tests

leibnitz27 commented 3 years ago

Ah, didn't see your latest push when I wrote that. ;)

Col-E commented 3 years ago

Oh but moving the literal computation is a nice idea.

leibnitz27 commented 3 years ago

Let me know when you're done (sounds like you want to do a bit from last comment)

Col-E commented 3 years ago

I'm away for a bit (3-4 hours or so) so you can review the current changes until I can get back. I marked items as resolved as I went but you can double check.

I see in the example push you separated the arith for narrow primitives, and I've not yet done that.

leibnitz27 commented 3 years ago

no hurry!

Col-E commented 3 years ago

CastTest3.txt

// before
CastTest3.foo((byte)(127 + 127));
CastTest3.foo((float)((double)1.23f + (double)3.45f));
CastTest3.foo((double)1.23f + (double)3.45f);
CastTest3.foo(1.23f + 3.45f);
CastTest3.foo((byte)(127 / 0));
// after
CastTest3.foo(-2);
CastTest3.foo(4.6800003f);
CastTest3.foo(4.680000066757202);
CastTest3.foo(4.6800003f);
CastTest3.foo((byte)(127 / 0));

Tested by pasting the expressions from before into IJ's evaluator and got the exact matches of the after so these ought to be correct.

leibnitz27 commented 3 years ago

Cool!

Col-E commented 3 years ago

🥳

leibnitz27 / cfr

Anti-Obfuscation: Arithmetic and casting folding #211