soot-oss / soot

Soot - A Java optimization framework
GNU Lesser General Public License v2.1
2.85k stars 706 forks source link

jb.sils interferes with jb:use-original-names #1641

Open michael-emmi opened 3 years ago

michael-emmi commented 3 years ago

Describe the bug

With SharedInitializationLocalSplitter enabled, local variables no longer seem to carry their original names. We had been using this option:

Options.v().setPhaseOption("jb", "use-original-names:true");

And seeing locals with names from the source program. However, new code #35c5bcb seems to interfere with this; now we just see many $stack… variables.

The original names appear again if we disable this pass as mentioned already:

Options.v().setPhaseOption("jb.sils", "enabled:false");

@linghuiluo has identified this as a defect: https://github.com/soot-oss/soot/issues/1614#issuecomment-853287543

To reproduce Run soot with the jb:use-original-names option enabled:

Options.v().setPhaseOption("jb", "use-original-names:true");

Then compare the names of Jimple local variables with jb.sils disabled:

Options.v().setPhaseOption("jb.sils", "enabled:false");

Expected behavior Jimple local variables are identified with their original source-code names.

Stacktrace N/A

Input file Any

Additional context Preserving the names of local variables has been quite useful for debugging intermediate analysis results. Having to disable jb.sils to maintain this functionality is unfortunate.

linghuiluo commented 3 years ago

I just tested the develop branch. Now we got sth really nasty. It seems that even disabling the transformer Options.v().setPhaseOption("jb.sils", "enabled:false"); will not help. The original names don't show up at all.

Might related to this commit https://github.com/soot-oss/soot/commit/7abeb18d935c31b7bc0ea3524450c218b4e786cb

tim-hoffman commented 3 years ago

@linghuiluo I just did a quick test of compiling soot both with and without that commit you mentioned and ran with "-p jb use-original-names:true". There was no change in the output between the 2 compiles which shows that this commit is not the cause of this issue. I did observe that the original names are not appearing in the output class file with either version but the original names do show up if you use Jimple output format. I also tried with Baf output and again the original names are missing (even without that commit you mentioned) so it seems like there was some change affecting the Jimple-to-Baf translation that causes the names to be lost. UPDATE: I retract the final sentence because I had forgotten that original local names are not visible if you just print Baf code because they are stored in a separate field within the BafLocal class that is only used to generate an output class file when the -write-local-annotations option is used in Soot. I had also failed to mention that in testing I had disabled "jb.sils" because I was only trying to show that my commit (which was not related to jb.sils at all) was not the cause of the issue.

linghuiluo commented 3 years ago

@tim-hoffman you also merged many PRs in the last days. Not just that commit to that file.

tim-hoffman commented 3 years ago

I searched just a little bit further and I see if you add the "-write-local-annotations" option, then the local variable tables will be included in the class file and thus the classes will contain original names. I wonder if that option used to be enabled automatically when the "-p jb use-original-names:true" option was enabled? Have you tried working backwards to see what commit first introduced this problem?

linghuiluo commented 3 years ago

@tim-hoffman which commit you meant with the "-write-local-annotations" option. i certainly didn't introduce this.

tim-hoffman commented 3 years ago

I'm not accusing you of anything. I'm simply stating that running Soot with options -write-local-annotations -p jb use-original-names:true produces a local variable table in the class file with fully preserved names but running Soot with only -p jb use-original-names:true produces a class file with no local variable table and thus local names are not preserved. In fact, upon reviewing the soot options https://soot-build.cs.uni-paderborn.de/public/origin/develop/soot/soot-develop/options/soot_options.htm, the behavior I'm seeing is consistent with the "use-original-names" documentation because the original names are preserved in Jimple. It says nothing about preserving them in the output class file.

tim-hoffman commented 3 years ago

What I'm trying to say is, I cannot reproduce this bug with the latest version of develop. Everything seems to work as expected so far as I can tell.

@michael-emmi Can you update your copy of Soot to the latest version of develop and include:

  1. The full command line you use to run Soot
  2. The input file you used on Soot
  3. The output file generated by Soot and what output you expected instead?
linghuiluo commented 3 years ago

@tim-hoffman Before there was no need to set up -write-local-annotations. Enabling this option still doesn't produce Jimple with original names.

ericbodden commented 3 years ago

I am also confused. Who introduced an option -write-local-annotations and why? Soot always used to preserve local variable attributes out of the box, and by all means that should be the default.

@tim-hoffman we welcome your contributions but in the future please do not simply merge semantics-altering changes such as 7abeb18 without discussing these here. We have many people relying on Soot, and on local variable names in particular, and we don't want to break things for them.

@linghuiluo I wonder whether we can introduce a sort of "staging branch" which we can merge such commits into such that there they become tested, also using workloads such as the one at AWS, before we eventually merge them into develop.

tim-hoffman commented 3 years ago

I too depend on local variable names which is the reason I introduced that commit in the first place, because there was a bug where local variable names were not properly preserved! I apologize for the confusion. I did not say that anyone introduced either of the options I mentioned. They have both been in Soot for a long time (proof). I was trying to show @linghuiluo that this issue was in fact only related to the jb.sils pack (and not to my commit that she referenced) by describing various scenarios that I ran using the latest version of 'develop' and disabling the jb.sils pack. These scenarios behaved as expected, i.e. the original local names appeared in the Jimple code and, if the -write-local-annotations option is also used when running Soot, the original local names appeared in the LocalVariableTable in the generated class file as well. As proof, I have added PR #1643 with a test case that demonstrates the preservation of original local names in Jimple code when the jb.sils phase is disabled. This makes me believe that the original issue posed exists because of jb.sils and not because of my bug fix.

StevenArzt commented 3 years ago

Note that the new phase jb.sils was developed to tackle cases where the same local is used with incompatible types. See the PR https://github.com/soot-oss/soot/pull/1569 for code examples from real-world apps. If we split locals, we cannot give both locals the same original name. We therefore append a suffix, e.g., myvar_0 and myvar_1 if we split the local myvar into two locals. We do not create locals with totally different names.

I just had a brief look at the new test case. In the test target soot.asm.LocalNaming, the local epsilon disappears, because it only contains a constant, which gets inlined, same for zeta. Local iota is a simple calculation on constants that the optimizations also remove and inline. That's expected.

eta gets replaced by $stack17. That's definitely strange. jb.sils uses the CopyPropagator. If we kick out the CopyPropagator, we don't have that problem anymore. If we take a deeper look, we find the following Jimple code before copy propagation:

        $stack16 = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(iota);
        eta = $stack16;
        $stack17 = virtualinvoke eta.<java.lang.Long: long longValue()>();

Note that eta is a useless intermediate here. The copy propagator just takes $stack16 instead:

        $stack16 = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(iota);
        eta = $stack16;
        $stack17 = virtualinvoke $stack16.<java.lang.Long: long longValue()>();

Now we have a dead assignment in the middle, which gets kicked out. There you are, no eta left.

So what's the bug? I'd say the original Jimple code is already wrong, because it should directly use eta as a local instead of first introducing a stack local and then assigning its value. The issue is a direct consequence of this strange stack local being introduced. In my opinion, the code should look like this:

        eta = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(iota);
        $stack17 = virtualinvoke eta.<java.lang.Long: long longValue()>();

Now someone needs to look into the AsmMethodSource and see why we get the intermediate stack local.

tim-hoffman commented 3 years ago

@StevenArzt That sounds to me like the same underlying issue as #1614, where additional local aliases are being created for some reason.

StevenArzt commented 3 years ago

@tim-hoffman Not quite. Not jb.sils introduces the additional local in this case. This local already exists before jb.sils runs. It is rendundant, and there are two ways to clean up the redundancy: either remove eta or remove stack$16. From a code perspective, both options are equally correct, so the transformer is actually correct. We need to fix its input.

Note that the redundant local might have been there forever, and out of luck, it was never a problem - until we introduced code cleanup as part of the splitting process.

linghuiluo commented 3 years ago

@StevenArzt

 String[] args={"-process-dir",
                Paths.get("src", "test", "resources", "SimpleClass").toFile().getAbsolutePath(),
                "-src-prec", "c",
                "-output-format", "jimple",
                "-p", "jb", "use-original-names:true",
                "-p", "jb.sils", "enabled:false"
        };
   soot.Main.main(args);

I simply tried this on the Example.class from https://github.com/soot-oss/soot/tree/develop/src/test/resources/SimpleClass The java code:

    public void bar(int a, int b){
        int c = a*b;
        int d = 0;
        System.out.println(c);
    }

The output jimple code with jb.sils disabled:

 public void bar(int, int)
    {
        java.io.PrintStream $stack5;
        Example l0;
        int l1, l2, l3;
        boolean l4;

        l0 := @this: Example;

        l1 := @parameter0: int;

        l2 := @parameter1: int;

        l3 = l1 * l2;

        l4 = 0;

        $stack5 = <java.lang.System: java.io.PrintStream out>;

        virtualinvoke $stack5.<java.io.PrintStream: void println(int)>(l3);

        return;
    }

with jb.sils enabled:

  public void bar(int, int)
    {
        java.io.PrintStream $stack5;
        Example l0;
        int l1, l2, l3;

        l0 := @this: Example;

        l1 := @parameter0: int;

        l2 := @parameter1: int;

        l3 = l1 * l2;

        $stack5 = <java.lang.System: java.io.PrintStream out>;

        virtualinvoke $stack5.<java.io.PrintStream: void println(int)>(l3);

        return;
    } 

In both jimple outputs, I don't see the original names "a, b, c". Interestingly, the case @tim-hoffman created I see partial original names as the jimple code below shows:

 public void localNaming(java.lang.String, java.lang.Integer, byte[], java.lang.StringBuilder)
    {
        java.io.PrintStream omega;
        byte[] gamma;
        long $stack17;
        java.lang.Integer beta;
        java.lang.Long $stack16;
        LocalNaming this;
        java.lang.StringBuilder delta;
        java.lang.String alpha;

        this := @this: LocalNaming;

        alpha := @parameter0: java.lang.String;

        beta := @parameter1: java.lang.Integer;

        gamma := @parameter2: byte[];

        delta := @parameter3: java.lang.StringBuilder;

        gamma[0] = 23;

        virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(alpha);

        virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.Object)>(beta);

        $stack16 = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(180L);

        $stack17 = virtualinvoke $stack16.<java.lang.Long: long longValue()>();

        virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(long)>(90L);

        virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(long)>($stack17);

        omega = <java.lang.System: java.io.PrintStream out>;

        virtualinvoke omega.<java.io.PrintStream: void println(java.lang.Object)>(delta);

        return;
    }
StevenArzt commented 3 years ago

@linghuiluo I was referring to the test case that @tim-hoffman created and explained why some locals are correctly removed there by jb.sils, and that there is still an issue with local eta / $stack16. It is interesting that you did not get any proper local names at all when running Soot from the command line. If that happens even without jb.sils, the bug must be something else.

linghuiluo commented 3 years ago

@StevenArzt Strange. I just created a new example

    public void test(){
        int d = Config.getD();
        int f = Config.getF();
        int[] arr = new int[2];
        arr[0] = d;
        arr[1] = f;
        System.out.println(arr);
    }

if i have jb.sils enabled I got the following Jimple:

  public void test()
    {
        java.io.PrintStream $stack6;
        int[] arr;
        int $stack4, $stack5;
        Array1 this;

        this := @this: Array1;

        $stack4 = staticinvoke <Array1$Config: int getD()>();

        $stack5 = staticinvoke <Array1$Config: int getF()>();

        arr = newarray (int)[2];

        arr[0] = $stack4;

        arr[1] = $stack5;

        $stack6 = <java.lang.System: java.io.PrintStream out>;

        virtualinvoke $stack6.<java.io.PrintStream: void println(java.lang.Object)>(arr);

        return;
    }

with it disabled I see:

  public void test()
    {
        java.io.PrintStream $stack6;
        int[] arr;
        int d, f;
        Array1 this;

        this := @this: Array1;

        d = staticinvoke <Array1$Config: int getD()>();

        f = staticinvoke <Array1$Config: int getF()>();

        arr = newarray (int)[2];

        arr[0] = d;

        arr[1] = f;

        $stack6 = <java.lang.System: java.io.PrintStream out>;

        virtualinvoke $stack6.<java.io.PrintStream: void println(java.lang.Object)>(arr);

        return;
    }

which is what I would expect. I don't think your explanation above applies for this case, correct me if i am wrong.

StevenArzt commented 3 years ago

@linghuiluo That indeed looks different. Can you commit the full code as part of the LocalNaming test target class? Just mark the test case as @Ignore for now so that we don't break the build. Having a proper test case makes it easier to look into it, because your example seems to depend on other classes such as Config. Furthermore, we directly have a test case for later to test our fix.

linghuiluo commented 3 years ago

@StevenArzt sure will do.

tim-hoffman commented 3 years ago

I verified that @StevenArzt is right that the redundant local has been there for a while. I did a checkout of release 4.0.0 (5d989a3) from Dec 27, 2019 and then ran the LocalNaming test that @linghuiluo added and I see the $stack* temporary variables being added so that the initial jimple code is as follows:

...
$stack4 = staticinvoke <soot.asm.LocalNaming$Config: int getD()>();
d = $stack4;
$stack5 = staticinvoke <soot.asm.LocalNaming$Config: int getF()>();
f = $stack5;
...
tim-hoffman commented 3 years ago

It looks like it's the responsibility of the Aggregator (jb.a phase) to remove those stack temporaries. However, jb.sils runs before jb.a and its use of the CopyPropagator actually replaces all uses of the d and f variables with those stack temporaries. It seems there are a few options then, but @MarcMil may need to give input as the author of SharedInitializationLocalSplitter:

StevenArzt commented 3 years ago

There is a structure for a Jimple code example in #1569. Maybe you can create a couple of tests case from it, @tim-hoffman? That would definitely help. I don't think we should add the original apps, because this would just add megabytes of test targets to the repository, while being hard to debug in case something breaks. I'd prefer smaller Jimple code snippets without too many dependencies on other stuff from the respective app.

tim-hoffman commented 3 years ago

@StevenArzt, I wrote a test case to generate Jimple from the code example you mentioned as follows:

public void testMethod(boolean) 
{
    unknown b, s, u;
    b := @parameter0: boolean;
    s = <java.lang.System: java.io.PrintStream out>;
    u = 0;
    if b == 0 goto label1;
    virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u);
    goto label2;

 label1:
    virtualinvoke s.<java.io.PrintStream: void println(int)>(u);

 label2:
    return;
}

When I run the SharedInitializationLocalSplitter it just moves the constant 0 into the invoke expressions and does not introduce any new locals. I assumed this happens due to one of the ConstantPropagator transforms at the start of the SharedInitializationLocalSplitter so I tried removing all of those transformer uses from SharedInitializationLocalSplitter#internalTransform and then it does introduce another local as expected. So this test case alone does not show why all of those transformers are needed. Are there any other examples you could direct me to?

MarcMil commented 3 years ago

So, I've revisited this issue. I'm 99% sure this problem occurred with the Bringmeister app.

@tim-hoffman A better code which shows this "problem" is:

 public static void testMethod(boolean) 
{
    unknown b, s, u;
    b := @parameter0: boolean;
    s = <java.lang.System: java.io.PrintStream out>;
    if b == 0 goto label1;
    u = 0;
    goto label2;
 label1:
    u = 1;
 label2:
    if b == 0 goto label3;
    virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u);
    goto label4;
 label3:
    virtualinvoke s.<java.io.PrintStream: void println(int)>(u);
 label4:
    return;
}

Without SILS:

    public static void testMethod(boolean)
    {
        java.io.PrintStream s;
        boolean b, u#1;
        b := @parameter0: boolean;
        s = <java.lang.System: java.io.PrintStream out>;
        if b == 0 goto label1;
        u#1 = 0;
        goto label2;
     label1:
        u#1 = 1;
     label2:
        if b == 0 goto label3;
        virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u#1);
        goto label4;
     label3:
        virtualinvoke s.<java.io.PrintStream: void println(int)>(u#1); //<-- implicit cast to int
     label4:
        return;
    }

With SILS:

    public static void testMethod(boolean)
    {
        unknown b, s, u#1, u#1_1, u#1_2;
        b := @parameter0: boolean;
        s = <java.lang.System: java.io.PrintStream out>;
        if b == 0 goto label1;
        u#1 = 0;
        u#1_2 = 0;
        u#1_1 = 0;
        goto label2;
     label1:
        u#1 = 1;
        u#1_2 = 1;
        u#1_1 = 1;
     label2:
        if b == 0 goto label3;
        virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u#1_1);
        goto label4;
     label3:
        virtualinvoke s.<java.io.PrintStream: void println(int)>(u#1_2);
     label4:
        return;
    }

So, without SILS, we have a cast between int and boolean, which is not valid in pure Java. Thus, one could argue that this should not be valid in Jimple either. However, the semantics of Jimple in this regard are not completely clear to me. As it is right now, there is no dedicated BooleanConstant, instead, we have IntConstants with 1 or 0 for booleans. As such, it'd be OK for me to perform that cast. Maybe these semantics should be specified somewhere? The problem was that back then the TypeAssigner had massive problems coming up with any typing. I'm not sure what changed in the meantime, but now, Soot finds a typing. In the case of the Bringmeister app, this involves casting ints to booleans.

So, I'd suggest to disable or remove sils altogether. Our priority was that the TypeAssigner does not break and this seems to be the case right now. As such, we could do the simplest approach and disable this workaround altogether. This should solve all problems associated with it. We could take this test case to assure that the TypeAssigner does not break hard again.