wala / WALA

T.J. Watson Libraries for Analysis, with frontends for Java, Android, and JavaScript, and may common static program analyses
http://github.com/wala/WALA
Eclipse Public License 2.0
764 stars 223 forks source link

Why can't I find information about statements on local variables in WALA's IR? #1099

Open Chaoshuai-Li opened 2 years ago

Chaoshuai-Li commented 2 years ago

Recently I was starting to study WALA and was intrigued by his pointer analysis. However, it has recently dawned on me that there is information in his IR that I cannot understand. Here is the source code for the IR I am trying to generate.

public void nonDistributiveTest() {
        int a;
        int b;
        if (Math.random() > 0.5) {
            a = 1;
            b = 9;
        } else {
            a = 9;
            b = 1;
        }
        int c = a + b;
        String s = "str";
        String str = s + "str"
    }

Using the Jimple IR as an example, the intermediate code can be represented as follows:

    public void nonDistributiveTest()
    {
        double $stack6;
        byte $stack7, a, b;
        int c;
        java.lang.String s;
        java.lang.StringBuilder $stack8, $stack9, $stack10;
        ass1.IntraConstantPropagationTest this;
        this := @this: ass1.IntraConstantPropagationTest;
        $stack6 = staticinvoke <java.lang.Math: double random()>();
        $stack7 = $stack6 cmpl 0.5;
        if $stack7 <= 0 goto label1;
        a = 1;
        b = 9;
        goto label2;
     label1:
        a = 9;
        b = 1;
     label2:
        c = a + b;
        s = "str";
        $stack8 = new java.lang.StringBuilder;
        specialinvoke $stack8.<java.lang.StringBuilder: void <init>()>();
        $stack9 = virtualinvoke $stack8.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(s);
        $stack10 = virtualinvoke $stack9.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>("str");
        virtualinvoke $stack10.<java.lang.StringBuilder: java.lang.String toString()>();
        return;
    }

But in WALA IR and CFG, when I traverse Instructions I can't find any procedure for the local variables 'a' and 'b'. I don't understand if this has been removed by optimization or if there is something wrong with the way I have built it?

< Application, Lanalysis/wala/IntraConstantPropagationTest, nonDistributiveTest()V >
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..0]
    -> BB2
    -> BB11
BB2[1..4]
    -> BB4
    -> BB3
BB3[5..9]
    -> BB5
BB4[10..13]
    -> BB5
BB5[14..20]
    -> BB6
    -> BB11
BB6[21..23]
    -> BB7
    -> BB11
BB7[24..24]
    -> BB8
    -> BB11
BB8[25..26]
    -> BB9
    -> BB11
BB9[27..27]
    -> BB10
    -> BB11
BB10[28..29]
    -> BB11
BB11[-1..-2]
Instructions:
BB0
BB1
0   v4 = invokestatic < Application, Ljava/lang/Math, random()D > @0 exception:v3(line 11)
BB2
2   v6 = compare v4,v5:#0.5 opcode=cmpl      (line 11)
4   conditional branch(le, to iindex=10) v6,v7:#0(line 11)
BB3
9   goto (from iindex= 9 to iindex = 14)     (line 14)
BB4
BB5
           v10 = phi  v9:#1,v8:#9
           v11 = phi  v8:#9,v9:#1
16   v12 = binaryop(add) v10 , v11           (line 18) [10=[a]11=[b]]
20   v14 = new <Application,Ljava/lang/StringBuilder>@31(line 21)
BB6
23   v16 = invokestatic < Application, Ljava/lang/String, valueOf(Ljava/lang/Object;)Ljava/lang/String; > v13:#str @37 exception:v15(line 21) [13=[s]]
BB7
24   invokespecial < Application, Ljava/lang/StringBuilder, <init>(Ljava/lang/String;)V > v14,v16 @40 exception:v17(line 21)
BB8
26   v19 = invokevirtual < Application, Ljava/lang/StringBuilder, append(Ljava/lang/String;)Ljava/lang/StringBuilder; > v14,v13:#str @45 exception:v18(line 21) [13=[s]]
BB9
27   v21 = invokevirtual < Application, Ljava/lang/StringBuilder, toString()Ljava/lang/String; > v19 @48 exception:v20(line 21)
BB10
29   return                                  (line 23)
BB11

Very much looking forward to your response, although I'm not sure if this is an inexperienced question to ask, thanks.

        scope = AnalysisScopeReader.readJavaScope("primordial.txt",
                new FileProvider().getFile("Java60RegressionExclusions.txt"), javaLoader);
        ClassLoaderReference clr = scope.getLoader(Atom.findOrCreateUnicodeAtom("Application"));

        File file = new FileProvider().getFile("E:\\EclipseEnterprise2020\\refactorExample\\TestInstanceof",
                javaLoader);
        scope.addToScope(clr, new BinaryDirectoryTreeModule(file));
        cha = ClassHierarchyFactory.make(scope);
        Iterable<Entrypoint> entrypoints = Util.makeMainEntrypoints(scope, cha);
        entrypoints = new AllApplicationEntrypoints(scope, cha);
        AnalysisOptions options = new AnalysisOptions(scope, entrypoints);
        CallGraphBuilder<InstanceKey> builder = Util.makeZeroCFABuilder(Language.JAVA, options, new AnalysisCacheImpl(), cha, scope);
msridhar commented 2 years ago

@Chaoshuai-Li thanks for the question. You're right that there is a difference in the IRs. WALA's IR is in SSA form and copy propagation is formed during SSA construction. So, copies of local variables and writing of constants to locals is not preserved in the IR. This is unfortunately not documented in a single place but is partially described in a couple of places:

https://github.com/wala/WALA/wiki/Slicer#warning-exclusion-of-copy-statements-from-slice https://github.com/wala/WALA/wiki/Mapping-to-source-code#mapping-value-numbers-back-to-local-names

In retrospect, it's not clear using SSA form was the right decision for a static analysis framework like WALA, but it would be a hard decision to undo now. In the WALA IR you will mostly find instructions involving statements like method calls, pointer dereferences, and control flow, but not copies of values of primitive type.

I hope that helps; let me know if you have further questions.

Chaoshuai-Li commented 2 years ago

@Chaoshuai-Li thanks for the question. You're right that there is a difference in the IRs. WALA's IR is in SSA form and copy propagation is formed during SSA construction. So, copies of local variables and writing of constants to locals is not preserved in the IR. This is unfortunately not documented in a single place but is partially described in a couple of places:

https://github.com/wala/WALA/wiki/Slicer#warning-exclusion-of-copy-statements-from-slice https://github.com/wala/WALA/wiki/Mapping-to-source-code#mapping-value-numbers-back-to-local-names

In retrospect, it's not clear using SSA form was the right decision for a static analysis framework like WALA, but it would be a hard decision to undo now. In the WALA IR you will mostly find instructions involving statements like method calls, pointer dereferences, and control flow, but not copies of values of primitive type.

I hope that helps; let me know if you have further questions.

Thank you for your prompt and patient response, I am currently refactoring my software to accommodate the Java API update, so these local variables are really important to me as well. Perhaps this requires me to change my thinking about refactoring. But it is true that WALA has been a great help to me in my previous refactorings with the aim of optimizing programs.

msridhar commented 2 years ago

@Chaoshuai-Li if you describe more what facts you are trying to compute about local variables, possibly I can help in figuring out if you can do what you need with WALA. Just let me know.

Chaoshuai-Li commented 2 years ago

@msridhar I appreciate your taking such an interest in my problem, it is truly an honor.

As you know the recent Java18 introduced a lot of new uses for multi-branch statements (such as if...else... and switch), and I am trying to provide some automatic refactoring tools for this. In the process, we have found that some of the actual programs have unreasonable multi-branch statement judgment structures, and some even unintentionally increase the depth of judgment. I am trying to construct the control flow automaton for that multi-branch statement corresponding to the contents of the AST. Based on this automaton I will try to optimize the judgment structure of the multi-branch statement and even reduce its unnecessary circle complexity.

Although the idea is not prudent enough, I still hope that static analysis will help me to construct this well-established control flow automaton, and not just through AST traversal. The automaton will guide the refactoring tool to complete the automatic refactoring of the source program. My idea is still in the validation phase as well as I am trying to figure out how to map the control flow information with the AST content.

msridhar commented 2 years ago

@Chaoshuai-Li if you only need to do intra-procedural dataflow analysis, I suggest taking a look at the Checker Dataflow library:

https://checkerframework.org/manual/checker-framework-dataflow-manual.pdf

It works on the ASTs computed by javac itself. So to use it, you could write an annotation processor that computes control-flow graphs for methods as they are being compiled and then writes out whatever information you need regarding the control flow.

Chaoshuai-Li commented 2 years ago

@msridhar I received your suggestion and will read it now, I think it will be very helpful to me.

Dear Professor, do you have some mentoring projects to share? Although I am very interested in static program analysis, the lack of guides and partners has made it really slow to learn.

I'm already following you on GitHub and really hope to have the opportunity to consult you again.