Open michael-emmi opened 3 years ago
I just tested the develop branch. Now we got sth really nasty. It seems that even disabling the transformer Options.v().setPhaseOption("jb.sils", "enabled:false"); will not help. The original names don't show up at all.
Might related to this commit https://github.com/soot-oss/soot/commit/7abeb18d935c31b7bc0ea3524450c218b4e786cb
@linghuiluo I just did a quick test of compiling soot both with and without that commit you mentioned and ran with "-p jb use-original-names:true". There was no change in the output between the 2 compiles which shows that this commit is not the cause of this issue. I did observe that the original names are not appearing in the output class file with either version but the original names do show up if you use Jimple output format. I also tried with Baf output and again the original names are missing (even without that commit you mentioned) so it seems like there was some change affecting the Jimple-to-Baf translation that causes the names to be lost.
UPDATE: I retract the final sentence because I had forgotten that original local names are not visible if you just print Baf code because they are stored in a separate field within the BafLocal class that is only used to generate an output class file when the -write-local-annotations
option is used in Soot. I had also failed to mention that in testing I had disabled "jb.sils" because I was only trying to show that my commit (which was not related to jb.sils at all) was not the cause of the issue.
@tim-hoffman you also merged many PRs in the last days. Not just that commit to that file.
I searched just a little bit further and I see if you add the "-write-local-annotations" option, then the local variable tables will be included in the class file and thus the classes will contain original names. I wonder if that option used to be enabled automatically when the "-p jb use-original-names:true" option was enabled? Have you tried working backwards to see what commit first introduced this problem?
@tim-hoffman which commit you meant with the "-write-local-annotations" option. i certainly didn't introduce this.
I'm not accusing you of anything. I'm simply stating that running Soot with options -write-local-annotations -p jb use-original-names:true
produces a local variable table in the class file with fully preserved names but running Soot with only -p jb use-original-names:true
produces a class file with no local variable table and thus local names are not preserved.
In fact, upon reviewing the soot options https://soot-build.cs.uni-paderborn.de/public/origin/develop/soot/soot-develop/options/soot_options.htm, the behavior I'm seeing is consistent with the "use-original-names" documentation because the original names are preserved in Jimple. It says nothing about preserving them in the output class file.
What I'm trying to say is, I cannot reproduce this bug with the latest version of develop
. Everything seems to work as expected so far as I can tell.
@michael-emmi Can you update your copy of Soot to the latest version of develop
and include:
@tim-hoffman Before there was no need to set up -write-local-annotations
. Enabling this option still doesn't produce Jimple with original names.
I am also confused. Who introduced an option -write-local-annotations
and why? Soot always used to preserve local variable attributes out of the box, and by all means that should be the default.
@tim-hoffman we welcome your contributions but in the future please do not simply merge semantics-altering changes such as 7abeb18 without discussing these here. We have many people relying on Soot, and on local variable names in particular, and we don't want to break things for them.
@linghuiluo I wonder whether we can introduce a sort of "staging branch" which we can merge such commits into such that there they become tested, also using workloads such as the one at AWS, before we eventually merge them into develop.
I too depend on local variable names which is the reason I introduced that commit in the first place, because there was a bug where local variable names were not properly preserved!
I apologize for the confusion. I did not say that anyone introduced either of the options I mentioned. They have both been in Soot for a long time (proof). I was trying to show @linghuiluo that this issue was in fact only related to the jb.sils
pack (and not to my commit that she referenced) by describing various scenarios that I ran using the latest version of 'develop' and disabling the jb.sils
pack. These scenarios behaved as expected, i.e. the original local names appeared in the Jimple code and, if the -write-local-annotations
option is also used when running Soot, the original local names appeared in the LocalVariableTable in the generated class file as well. As proof, I have added PR #1643 with a test case that demonstrates the preservation of original local names in Jimple code when the jb.sils
phase is disabled. This makes me believe that the original issue posed exists because of jb.sils
and not because of my bug fix.
Note that the new phase jb.sils
was developed to tackle cases where the same local is used with incompatible types. See the PR https://github.com/soot-oss/soot/pull/1569 for code examples from real-world apps. If we split locals, we cannot give both locals the same original name. We therefore append a suffix, e.g., myvar_0
and myvar_1
if we split the local myvar
into two locals. We do not create locals with totally different names.
I just had a brief look at the new test case. In the test target soot.asm.LocalNaming
, the local epsilon
disappears, because it only contains a constant, which gets inlined, same for zeta
. Local iota
is a simple calculation on constants that the optimizations also remove and inline. That's expected.
eta
gets replaced by $stack17
. That's definitely strange. jb.sils
uses the CopyPropagator
. If we kick out the CopyPropagator
, we don't have that problem anymore. If we take a deeper look, we find the following Jimple code before copy propagation:
$stack16 = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(iota);
eta = $stack16;
$stack17 = virtualinvoke eta.<java.lang.Long: long longValue()>();
Note that eta
is a useless intermediate here. The copy propagator just takes $stack16
instead:
$stack16 = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(iota);
eta = $stack16;
$stack17 = virtualinvoke $stack16.<java.lang.Long: long longValue()>();
Now we have a dead assignment in the middle, which gets kicked out. There you are, no eta
left.
So what's the bug? I'd say the original Jimple code is already wrong, because it should directly use eta
as a local instead of first introducing a stack local and then assigning its value. The issue is a direct consequence of this strange stack local being introduced. In my opinion, the code should look like this:
eta = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(iota);
$stack17 = virtualinvoke eta.<java.lang.Long: long longValue()>();
Now someone needs to look into the AsmMethodSource
and see why we get the intermediate stack local.
@StevenArzt That sounds to me like the same underlying issue as #1614, where additional local aliases are being created for some reason.
@tim-hoffman Not quite. Not jb.sils
introduces the additional local in this case. This local already exists before jb.sils
runs. It is rendundant, and there are two ways to clean up the redundancy: either remove eta
or remove stack$16
. From a code perspective, both options are equally correct, so the transformer is actually correct. We need to fix its input.
Note that the redundant local might have been there forever, and out of luck, it was never a problem - until we introduced code cleanup as part of the splitting process.
@StevenArzt
String[] args={"-process-dir",
Paths.get("src", "test", "resources", "SimpleClass").toFile().getAbsolutePath(),
"-src-prec", "c",
"-output-format", "jimple",
"-p", "jb", "use-original-names:true",
"-p", "jb.sils", "enabled:false"
};
soot.Main.main(args);
I simply tried this on the Example.class from https://github.com/soot-oss/soot/tree/develop/src/test/resources/SimpleClass The java code:
public void bar(int a, int b){
int c = a*b;
int d = 0;
System.out.println(c);
}
The output jimple code with jb.sils disabled:
public void bar(int, int)
{
java.io.PrintStream $stack5;
Example l0;
int l1, l2, l3;
boolean l4;
l0 := @this: Example;
l1 := @parameter0: int;
l2 := @parameter1: int;
l3 = l1 * l2;
l4 = 0;
$stack5 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $stack5.<java.io.PrintStream: void println(int)>(l3);
return;
}
with jb.sils enabled:
public void bar(int, int)
{
java.io.PrintStream $stack5;
Example l0;
int l1, l2, l3;
l0 := @this: Example;
l1 := @parameter0: int;
l2 := @parameter1: int;
l3 = l1 * l2;
$stack5 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $stack5.<java.io.PrintStream: void println(int)>(l3);
return;
}
In both jimple outputs, I don't see the original names "a, b, c". Interestingly, the case @tim-hoffman created I see partial original names as the jimple code below shows:
public void localNaming(java.lang.String, java.lang.Integer, byte[], java.lang.StringBuilder)
{
java.io.PrintStream omega;
byte[] gamma;
long $stack17;
java.lang.Integer beta;
java.lang.Long $stack16;
LocalNaming this;
java.lang.StringBuilder delta;
java.lang.String alpha;
this := @this: LocalNaming;
alpha := @parameter0: java.lang.String;
beta := @parameter1: java.lang.Integer;
gamma := @parameter2: byte[];
delta := @parameter3: java.lang.StringBuilder;
gamma[0] = 23;
virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(alpha);
virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.Object)>(beta);
$stack16 = staticinvoke <java.lang.Long: java.lang.Long valueOf(long)>(180L);
$stack17 = virtualinvoke $stack16.<java.lang.Long: long longValue()>();
virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(long)>(90L);
virtualinvoke delta.<java.lang.StringBuilder: java.lang.StringBuilder append(long)>($stack17);
omega = <java.lang.System: java.io.PrintStream out>;
virtualinvoke omega.<java.io.PrintStream: void println(java.lang.Object)>(delta);
return;
}
@linghuiluo I was referring to the test case that @tim-hoffman created and explained why some locals are correctly removed there by jb.sils
, and that there is still an issue with local eta
/ $stack16
. It is interesting that you did not get any proper local names at all when running Soot from the command line. If that happens even without jb.sils
, the bug must be something else.
@StevenArzt Strange. I just created a new example
public void test(){
int d = Config.getD();
int f = Config.getF();
int[] arr = new int[2];
arr[0] = d;
arr[1] = f;
System.out.println(arr);
}
if i have jb.sils enabled I got the following Jimple:
public void test()
{
java.io.PrintStream $stack6;
int[] arr;
int $stack4, $stack5;
Array1 this;
this := @this: Array1;
$stack4 = staticinvoke <Array1$Config: int getD()>();
$stack5 = staticinvoke <Array1$Config: int getF()>();
arr = newarray (int)[2];
arr[0] = $stack4;
arr[1] = $stack5;
$stack6 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $stack6.<java.io.PrintStream: void println(java.lang.Object)>(arr);
return;
}
with it disabled I see:
public void test()
{
java.io.PrintStream $stack6;
int[] arr;
int d, f;
Array1 this;
this := @this: Array1;
d = staticinvoke <Array1$Config: int getD()>();
f = staticinvoke <Array1$Config: int getF()>();
arr = newarray (int)[2];
arr[0] = d;
arr[1] = f;
$stack6 = <java.lang.System: java.io.PrintStream out>;
virtualinvoke $stack6.<java.io.PrintStream: void println(java.lang.Object)>(arr);
return;
}
which is what I would expect. I don't think your explanation above applies for this case, correct me if i am wrong.
@linghuiluo That indeed looks different. Can you commit the full code as part of the LocalNaming
test target class? Just mark the test case as @Ignore
for now so that we don't break the build. Having a proper test case makes it easier to look into it, because your example seems to depend on other classes such as Config
. Furthermore, we directly have a test case for later to test our fix.
@StevenArzt sure will do.
I verified that @StevenArzt is right that the redundant local has been there for a while.
I did a checkout of release 4.0.0 (5d989a3) from Dec 27, 2019 and then ran the LocalNaming
test that @linghuiluo added and I see the $stack*
temporary variables being added so that the initial jimple code is as follows:
...
$stack4 = staticinvoke <soot.asm.LocalNaming$Config: int getD()>();
d = $stack4;
$stack5 = staticinvoke <soot.asm.LocalNaming$Config: int getF()>();
f = $stack5;
...
It looks like it's the responsibility of the Aggregator (jb.a
phase) to remove those stack temporaries. However, jb.sils
runs before jb.a
and its use of the CopyPropagator
actually replaces all uses of the d
and f
variables with those stack temporaries.
It seems there are a few options then, but @MarcMil may need to give input as the author of SharedInitializationLocalSplitter
:
jb.sils
run after jb.a
and still be effective?jb.sils
actually need to use the CopyPropagator
? Could it use the Aggregator
instead to achieve a similar result?SharedInitializationLocalSplitter
need to use so many other transformations before running, including repeating them several times?SharedInitializationLocalSplitter
is necessary? There weren't any added with 35c5bcb. I think it would be good to add some tests to show why it is necessary and that it actually works. Those would then be helpful to ensure that fixing the current issue does not create any new issues.There is a structure for a Jimple code example in #1569. Maybe you can create a couple of tests case from it, @tim-hoffman? That would definitely help. I don't think we should add the original apps, because this would just add megabytes of test targets to the repository, while being hard to debug in case something breaks. I'd prefer smaller Jimple code snippets without too many dependencies on other stuff from the respective app.
@StevenArzt, I wrote a test case to generate Jimple from the code example you mentioned as follows:
public void testMethod(boolean)
{
unknown b, s, u;
b := @parameter0: boolean;
s = <java.lang.System: java.io.PrintStream out>;
u = 0;
if b == 0 goto label1;
virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u);
goto label2;
label1:
virtualinvoke s.<java.io.PrintStream: void println(int)>(u);
label2:
return;
}
When I run the SharedInitializationLocalSplitter
it just moves the constant 0
into the invoke expressions and does not introduce any new locals. I assumed this happens due to one of the ConstantPropagator transforms at the start of the SharedInitializationLocalSplitter
so I tried removing all of those transformer uses from SharedInitializationLocalSplitter#internalTransform
and then it does introduce another local as expected. So this test case alone does not show why all of those transformers are needed. Are there any other examples you could direct me to?
So, I've revisited this issue. I'm 99% sure this problem occurred with the Bringmeister app.
@tim-hoffman A better code which shows this "problem" is:
public static void testMethod(boolean)
{
unknown b, s, u;
b := @parameter0: boolean;
s = <java.lang.System: java.io.PrintStream out>;
if b == 0 goto label1;
u = 0;
goto label2;
label1:
u = 1;
label2:
if b == 0 goto label3;
virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u);
goto label4;
label3:
virtualinvoke s.<java.io.PrintStream: void println(int)>(u);
label4:
return;
}
Without SILS:
public static void testMethod(boolean)
{
java.io.PrintStream s;
boolean b, u#1;
b := @parameter0: boolean;
s = <java.lang.System: java.io.PrintStream out>;
if b == 0 goto label1;
u#1 = 0;
goto label2;
label1:
u#1 = 1;
label2:
if b == 0 goto label3;
virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u#1);
goto label4;
label3:
virtualinvoke s.<java.io.PrintStream: void println(int)>(u#1); //<-- implicit cast to int
label4:
return;
}
With SILS:
public static void testMethod(boolean)
{
unknown b, s, u#1, u#1_1, u#1_2;
b := @parameter0: boolean;
s = <java.lang.System: java.io.PrintStream out>;
if b == 0 goto label1;
u#1 = 0;
u#1_2 = 0;
u#1_1 = 0;
goto label2;
label1:
u#1 = 1;
u#1_2 = 1;
u#1_1 = 1;
label2:
if b == 0 goto label3;
virtualinvoke s.<java.io.PrintStream: void println(boolean)>(u#1_1);
goto label4;
label3:
virtualinvoke s.<java.io.PrintStream: void println(int)>(u#1_2);
label4:
return;
}
So, without SILS, we have a cast between int and boolean, which is not valid in pure Java. Thus, one could argue that this should not be valid in Jimple either. However, the semantics of Jimple in this regard are not completely clear to me. As it is right now, there is no dedicated BooleanConstant, instead, we have IntConstants with 1 or 0 for booleans. As such, it'd be OK for me to perform that cast. Maybe these semantics should be specified somewhere? The problem was that back then the TypeAssigner had massive problems coming up with any typing. I'm not sure what changed in the meantime, but now, Soot finds a typing. In the case of the Bringmeister app, this involves casting ints to booleans.
So, I'd suggest to disable or remove sils altogether. Our priority was that the TypeAssigner does not break and this seems to be the case right now. As such, we could do the simplest approach and disable this workaround altogether. This should solve all problems associated with it. We could take this test case to assure that the TypeAssigner does not break hard again.
Describe the bug
With SharedInitializationLocalSplitter enabled, local variables no longer seem to carry their original names. We had been using this option:
And seeing locals with names from the source program. However, new code #35c5bcb seems to interfere with this; now we just see many
$stack…
variables.The original names appear again if we disable this pass as mentioned already:
@linghuiluo has identified this as a defect: https://github.com/soot-oss/soot/issues/1614#issuecomment-853287543
To reproduce Run soot with the
jb:use-original-names
option enabled:Then compare the names of Jimple local variables with
jb.sils
disabled:Expected behavior Jimple local variables are identified with their original source-code names.
Stacktrace N/A
Input file Any
Additional context Preserving the names of local variables has been quite useful for debugging intermediate analysis results. Having to disable
jb.sils
to maintain this functionality is unfortunate.