pascal-lab / Tai-e

An easy-to-learn/use static analysis framework for Java
https://tai-e.pascal-lab.net/docs/index.html
GNU Lesser General Public License v3.0
1.45k stars 175 forks source link

Interface functions of the actual running class Not Displayed in Call Graph #77

Closed fangyuan00 closed 10 months ago

fangyuan00 commented 10 months ago

Description

Hello, I would like to build a call graph using Tai-e's context-insensitive pointer analysis. However, I observed that some interface functions of the actual running class are not displayed in the call graph.

Tai-e Configuration

pascal.taie.Main.main("-acp", classesDir,
      "-java", "8", "-ap",
      "-a", "pta=cs:ci;only-app:true;implicit-entries:true;distinguish-string-constants:reflection;" +
      "handle-invokedynamic:true;dump-ci:true;dump-yaml:true;reflection-inference:solar;" +
      "reflection-log:java-benchmarks/dacapo-2006/antlr-refl.log;",
      "-a", "cg=algorithm:pta;dump:true;dump-methods:true;dump-call-edges:true",
      "-scope", "REACHABLE");

The Decompiled Function Source Code

Interface: Lifecycle, Contained and Valve

..../catalina_6.0.33/org/apache/catalina/core/StandardPipeline.class

public void setBasic(Valve valve) {
    Valve oldBasic = this.basic;
    if (oldBasic != valve) {
        if (oldBasic != null) {
            if (this.started && oldBasic instanceof Lifecycle) {
                try {
                    ((Lifecycle)oldBasic).stop();
                } catch (LifecycleException var6) {
                    log.error("StandardPipeline.setBasic: stop", var6);
                }
            }
            if (oldBasic instanceof Contained) {
                try {
                    ((Contained)oldBasic).setContainer((Container)null);
                } catch (Throwable var5) {
                }
            }
        }
        if (valve != null) {
            if (valve instanceof Contained) {
                ((Contained)valve).setContainer(this.container);
            }
            if (valve instanceof Lifecycle) {
                try {
                    ((Lifecycle)valve).start();
                } catch (LifecycleException var4) {
                    log.error("StandardPipeline.setBasic: start", var4);
                    return;
                }
            }
            for(Valve current = this.first; current != null; current = current.getNext()) {
                if (current.getNext() == oldBasic) {
                    current.setNext(valve);
                    break;
                }
            }
            this.basic = valve;
        }
    }
}

The Generated Call Graph by PTA algorithm

"org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)": [
      "org.apache.catalina.Contained:void setContainer(org.apache.catalina.Container)",
      "org.apache.catalina.Valve:org.apache.catalina.Valve getNext()",
      "org.apache.catalina.Lifecycle:void stop()",
      "org.apache.catalina.Lifecycle:void start()",
      "org.apache.catalina.Valve:void setNext(org.apache.catalina.Valve)",
      "org.apache.juli.logging.Log:void error(java.lang.Object,java.lang.Throwable)"
    ]

Question

The call graph generates six callees of the function StandardPipeline:void setBasic(org.apache.catalina.Valve). However, the first five generated callees are all from interfaces, not the actual running classes. I am interested in understanding how to configure Tai-e to accurately capture the actual calls. Is this achievable through static analysis?

Additionally, when I perform call graph analysis using Soot's RTA algorithm, I obtain functions from all subclasses related to the interfaces. However, this approach introduces a significant number of false positives, which is not desirable for my purposes.

Environment

macOS
Tai-e-0.5.1-snapshot
IDEA

Additionally, I wonder the differences between the function CallGraph.callSitesIn(..) and the function CallGraph.getCalleesOfM(..) in Tai-e. Why is the result of the function CallGraph.getCalleesOfM(..) missing three callees? WechatIMG112071

I would greatly appreciate your assistance. Thank you so much.

zhangt2333 commented 10 months ago

The call graph generates six callees of the function StandardPipeline:void setBasic(org.apache.catalina.Valve). However, the first five generated callees are all from interfaces, not the actual running classes. I am interested in understanding how to configure Tai-e to accurately capture the actual calls. Is this achievable through static analysis?

Because of the lack of an reproducible example (see an example of how to write a reproducible case), I cannot reproduce your problem.

I tried our case Dispatch, it outputed as expected.

image

Additionally, I wonder the differences between the function CallGraph.callSitesIn(..) and the function CallGraph.getCalleesOfM(..) in Tai-e. Why is the result of the function CallGraph.getCalleesOfM(..) missing three callees?

Please check carefully the return value types of the two methods, they are different.

https://github.com/pascal-lab/Tai-e/blob/ca8e18010a0a542216a32b34e1c84bcbae4dbbb3/src/main/java/pascal/taie/analysis/graph/callgraph/CallGraph.java#L65-L70

https://github.com/pascal-lab/Tai-e/blob/ca8e18010a0a542216a32b34e1c84bcbae4dbbb3/src/main/java/pascal/taie/analysis/graph/callgraph/CallGraph.java#L50-L53

fangyuan00 commented 10 months ago

@zhangt2333 Hi, apologies for not providing a reproducible example previously. I have now included it.

Test Code

My test code is as follows.

package org.example;
import pascal.taie.analysis.graph.callgraph.CallGraph;
import pascal.taie.analysis.graph.callgraph.CallGraphBuilder;
import pascal.taie.config.AnalysisConfig;
import pascal.taie.ir.stmt.Invoke;
import pascal.taie.language.classes.JMethod;
import java.util.*;

public class App {
    static LinkedList<String> excludeList;
    public static LinkedList<String> excludeList() {
        if(excludeList==null) {
            excludeList = new LinkedList<String> ();

            excludeList.add("java.");
            excludeList.add("javax.");
            excludeList.add("sun.");
            excludeList.add("sunw.");
            excludeList.add("com.sun.");
            excludeList.add("com.ibm.");
            excludeList.add("com.apple.");
            excludeList.add("apple.awt.");

        }
        return excludeList;
    }

    public App(String... options){
        pascal.taie.Main.main(options);
    }

    public String processMethodName(String methodName){
        if (methodName.startsWith("<"))
            methodName = methodName.substring(1);
        if (methodName.endsWith(">"))
            methodName = methodName.substring(0, methodName.length()-1);
        String[] tmp = methodName.split(":");
        return tmp[0].strip()+":"+tmp[1].strip();
    }

    public CallGraph<Invoke, JMethod> useCGanalysis(){
        AnalysisConfig Config = AnalysisConfig.of("cg","dump", true, "dump-methods", true,
                "dump-call-edges", true, "algorithm", "pta");
        CallGraphBuilder builder = new CallGraphBuilder(Config);
        CallGraph<Invoke, JMethod> res = builder.analyze();
        return res;
    }

    public void testCallGraph(){
        CallGraph<Invoke, JMethod> callgraph = this.useCGanalysis();
        Queue<JMethod> queue = new LinkedList<>();
        HashSet<JMethod> visited = new HashSet<>();
        for (JMethod entry: callgraph.entryMethods().toList()){
            boolean isIgnore = false;
            for (String exclude: excludeList()){
                if (entry.getDeclaringClass().toString().startsWith(exclude)) {
                    isIgnore = true;
                    break;
                }
            }
            if (isIgnore)
                continue;
            queue.add(entry);
        }
        while (queue.size() > 0){
            JMethod caller = queue.poll();
            String callerName = this.processMethodName(caller.getSignature());
            visited.add(caller);
            List<JMethod> callees = callgraph.getCalleesOfM(caller).stream().toList();
            if (callerName.equals("org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)")){
                System.out.println("Reaching the target function.");
                List<Invoke> calleeInvokes = callgraph.callSitesIn(caller).toList();
                System.out.println("****** Invoke ******");
                System.out.println("[caller]: "+callerName);
                for (Invoke invoke: calleeInvokes){
                    String calleeName = this.processMethodName(invoke.getInvokeExp().getMethodRef().toString());
                    System.out.println("callee: "+calleeName);
                }
                System.out.println("********************");
                System.out.println("****** JMethod *****");
                System.out.println("[caller]: "+callerName);
                for (JMethod callee: callees){
                    String calleeName = this.processMethodName(callee.getSignature());
                    System.out.println("callee: "+calleeName);
                }
                System.out.println("********************");
                break;
            }
            for (JMethod callee: callees){
                boolean isIgnore = false;
                for (String exclude: excludeList()){
                    if (callee.getDeclaringClass().toString().startsWith(exclude)) {
                        isIgnore = true;
                        break;
                    }
                }
                if (isIgnore)
                    continue;
                if (!visited.contains(callee))
                    queue.add(callee);
            }
        }
    }
    public static void main(String[] args){
        App main = new App("-acp", "src/main/resources/catalina_6.0.33",
                "-java", "8", "-ap",
                "-a", "pta=cs:ci;only-app:true;implicit-entries:false",
                "-a", "cg=algorithm:pta;dump:true;dump-methods:true;dump-call-edges:true",
                "-scope", "REACHABLE");
        main.testCallGraph();
    }
}

Minor Changes

Due to the necessity of generating a comprehensive call graph, I added multiple entry points and excluded functions from basic Java classes. As a result, I made minor modifications to Tai-e. Tai-e/src/main/java/pascal/taie/analysis/pta/plugin/EntryPointHandler.java

public void onStart() {
        // process program main method
        JMethod main = World.get().getMainMethod();
        if (main != null) {
            solver.addEntryPoint(new EntryPoint(main,
                    new DeclaredParamProvider(main, solver.getHeapModel(), 1)));
        }
        // process implicit entries
        if (solver.getOptions().getBoolean("implicit-entries")) {
            for (JMethod entry : World.get().getImplicitEntries()) {
                solver.addEntryPoint(new EntryPoint(entry, EmptyParamProvider.get()));
            }
        }

        // fy: add all entries
        HeapModel heapModel = solver.getHeapModel();
        //get all application classes
        Stream<JClass> appClasses = World.get().getClassHierarchy().applicationClasses();
        Iterator<JClass> itr = appClasses.iterator();
        Collection<JMethod> entryPoints = new ArrayList<JMethod>();
        while (itr.hasNext()) {
            JClass appClass = itr.next();
            if (appClass.isInterface())
                continue;
            if (appClass.isAbstract())
                continue;
            Collection<JMethod> methods = appClass.getDeclaredMethods();
            entryPoints.addAll(methods);
        }
        for (JMethod entryPoint : entryPoints) {
            Type appType = entryPoint.getDeclaringClass().getType();
            List<String> appTmp = Arrays.asList(appType.toString().split("\\."));
            String appAlloc = "<" + appTmp.get(appTmp.size() - 1) + ">";
            Obj thisObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, appAlloc, appType);
            List<Type> paramTypes = entryPoint.getParamTypes();
            List<Obj> mockObjs = new ArrayList<Obj>();
            for (Type type : paramTypes) {
                List<String> tmp = Arrays.asList(type.toString().split("\\."));
                String alloc = "<" + tmp.get(tmp.size() - 1) + ">";
                Obj mockObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, alloc, type, entryPoint);
                mockObjs.add(mockObj);
            }
            SpecifiedParamProvider.Builder builder = new SpecifiedParamProvider.Builder(entryPoint)
                    .addThisObj(thisObj);
            for (int i = 0; i < mockObjs.size(); i++) {
                builder.addParamObj(i, mockObjs.get(i));
            }
            SpecifiedParamProvider paramProvider = builder.build();
            solver.addEntryPoint(new EntryPoint(entryPoint, paramProvider));
        }
}

Target Function

Target function: org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve) The different callee results between callgraph.callSitesIn(caller) and callgraph.getCalleesOfM(caller):

Reaching the target function.
****** Invoke ****** (`callgraph.callSitesIn`)
[caller]: org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)
callee: org.apache.catalina.Lifecycle:void stop()
callee: org.apache.juli.logging.Log:void error(java.lang.Object,java.lang.Throwable)
callee: org.apache.catalina.Contained:void setContainer(org.apache.catalina.Container)
callee: org.apache.catalina.Contained:void setContainer(org.apache.catalina.Container)
callee: org.apache.catalina.Lifecycle:void start()
callee: org.apache.juli.logging.Log:void error(java.lang.Object,java.lang.Throwable)
callee: org.apache.catalina.Valve:org.apache.catalina.Valve getNext()
callee: org.apache.catalina.Valve:void setNext(org.apache.catalina.Valve)
callee: org.apache.catalina.Valve:org.apache.catalina.Valve getNext()
********************
****** JMethod ***** (`callgraph.getCalleesOfM`)
[caller]: org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)
callee: org.apache.catalina.valves.ValveBase:org.apache.catalina.Valve getNext()
callee: org.apache.catalina.valves.ValveBase:void setContainer(org.apache.catalina.Container)
callee: org.apache.catalina.valves.ValveBase:void setNext(org.apache.catalina.Valve)
callee: org.apache.catalina.core.StandardContextValve:void setContainer(org.apache.catalina.Container)
********************

Besides, the callees in the testTaieCG/output/call-edges.txt are the same as the result of callgraph.getCalleesOfM(caller), which missed the callees Lifecycle: void start(), Lifecycle: void stop(), and Log: void error(..). Although the result of callee.callSitesIn(caller) is complete and comprehensive, but the callees' classes are interfaces (Interfaces: Contained, Lifecycle, Valve) not the actual run subclasses.

Test Package

The test package is org.apache.tomcat:catalina:6.0.33 catalina_6.0.33.zip

I am looking forward to your reply. Thank you so much!

zhangt2333 commented 10 months ago

Thank you for extracting and editing and sending this detailed information! ❤️

Maybe it's more convenient for you to just package the code and upload it (in the Issue Comment Editor, or more professionally, in the GitHub Repo)~

I'll take a look.

fangyuan00 commented 10 months ago

@zhangt2333 Thank you for your reminding. The test code repository is https://github.com/fangyuan00/testTaieCG. The test package located on src/main/resources/catalina_6.0.33. The compiled Tai-e library is on lib/.

zhangt2333 commented 10 months ago

That's really helpful!! It brings me immense power and joy to contribute to the open-source with ❤️, especially working to address problems for individuals like yourself.


Besides, the callees in the testTaieCG/output/call-edges.txt are the same as the result of callgraph.getCalleesOfM(caller), which missed the callees Lifecycle: void start(), Lifecycle: void stop(), and Log: void error(..). Although the result of callee.callSitesIn(caller) is complete and comprehensive, but the callees' classes are interfaces (Interfaces: Contained, Lifecycle, Valve) not the actual run subclasses.

The essence of pascal.taie.ir.stmt.Invoke.

An pascal.taie.ir.stmt.Invoke, which is one of the IR elements in Tai-e, only represent the infomation in the code level but not the running/analysis. So invoke.getInvokeExp().getMethodRef() will only get that it's calling a method represented in the code level, e.g., Lifecycle: void start().

How to get the actual callee?

Invoke will be dispatched to its real callee(s) based on its base/receiver object during pointer analysis.

Just need to make a few changes to your code:

    public static void main(String[] args){
        App main = ...
        ...
+        World world = World.get();
+        PointerAnalysisResult ptaResult = world.getResult(PointerAnalysis.ID);
+        CallGraph<Invoke, JMethod> cg = world.getResult(CallGraphBuilder.ID);
+        JMethod method = world.getClassHierarchy().getMethod(
+                "<org.apache.catalina.core.StandardPipeline: void setBasic(org.apache.catalina.Valve)>");
+        cg.getCallSitesIn(method).forEach(invoke -> {
+            Set<JMethod> callees = cg.getCalleesOf(invoke);
+            System.out.println(invoke + "\n   calls to " + callees);
+            if (invoke.getInvokeExp() instanceof InvokeInstanceExp invokeInstanceExp) {
+                Var base = invokeInstanceExp.getBase();
+                Set<Obj> pointsToSet = ptaResult.getPointsToSet(base);
+                System.out.println("  based on " + base + " pointing to " + pointsToSet);
+            }
+            System.out.println();
        });

Then it outputed:

<org.apache.catalina.core.StandardPipeline: void setBasic(org.apache.catalina.Valve)>[10@L369] invokeinterface $r8.stop()
  calls to []
  based on $r8 pointing to []

...

<org.apache.catalina.core.StandardPipeline: void setBasic(org.apache.catalina.Valve)>[19@L376] invokeinterface $r7.setContainer(%nullconst)
  calls to [<org.apache.catalina.valves.ValveBase: void setContainer(org.apache.catalina.Container)>, <org.apache.catalina.core.StandardContextValve: void setContainer(org.apache.catalina.Container)>]
  based on $r7 pointing to [NewObj{<org.apache.catalina.core.StandardEngine: void <init>()>[11@L79] new org.apache.catalina.core.StandardEngineValve}, NewObj{<org.apache.catalina.core.StandardContext: void <init>()>[171@L132] new org.apache.catalina.core.StandardContextValve}, NewObj{<org.apache.catalina.core.StandardHost: void <init>()>[31@L73] new org.apache.catalina.core.StandardHostValve}]

...

Because the base/receiver object $r8 of invokeinterface $r8.stop() pointed to nothing, it invoked nothing.

fangyuan00 commented 10 months ago

Thank you for resolving the problems that puzzled me for two days!!