pascal-lab / Tai-e

An easy-to-learn/use static analysis framework for Java
https://tai-e.pascal-lab.net/docs/index.html
GNU Lesser General Public License v3.0
1.44k stars 175 forks source link

In the call graph, the call edges related to dynamic proxy are missing. #123

Open YunFy26 opened 1 week ago

YunFy26 commented 1 week ago

šŸ“ Overall Description

### For the following demo `Service.java` ```java public interface Service { void doSomething(); } ``` `ServiceImpl.java` ```java public class ServiceImpl implements Service { @Override public void doSomething() { System.out.println("Performing task in ServiceImpl..."); } } ``` `MyInvocationHandler.java` ```java public class MyInvocationHandler implements InvocationHandler { private final Object target; public MyInvocationHandler(Object target) { this.target = target; } @Override public Object invoke(Object proxy, Method method, Object[] args) throws Throwable { System.out.println("before method call..."); // method invoke Object result = method.invoke(target, args); System.out.println("after method call..."); return result; } public static Object getProxy(Object target) { return Proxy.newProxyInstance( target.getClass().getClassLoader(), target.getClass().getInterfaces(), new MyInvocationHandler(target) ); } } ``` `Main.java` ```java public class Main { public static void main(String[] args) { ServiceImpl service = new ServiceImpl(); Service proxy = (Service) MyInvocationHandler.getProxy(service); proxy.doSomething(); } } ``` `IR` of `Main.java` ```java public static void main(java.lang.String[] r3) { org.example.proxy.ServiceImpl $r0; java.lang.Object $r1; org.example.proxy.Service r2; [0@L10] $r0 = new org.example.proxy.ServiceImpl; [1@L10] invokespecial $r0.()>(); [2@L11] $r1 = invokestatic ($r0); [3@L11] r2 = (org.example.proxy.Service) $r1; [4@L12] invokeinterface r2.(); [5@L13] return; } ```

The call-graph as follows:

digraph G {
  node [color=".3 .2 1.0",shape=box,style=filled];
  edge [];
  "0" [label="<java.lang.Class: java.lang.Class[] getInterfaces()>",];
  "1" [label="<java.lang.Class: java.lang.ClassLoader getClassLoader()>",];
  "2" [label="<org.example.proxy.MyInvocationHandler: java.lang.Object getProxy(java.lang.Object)>",];
  "3" [label="<java.lang.Object: java.lang.Class getClass()>",];
  "4" [label="<org.example.proxy.MyInvocationHandler: void <init>(java.lang.Object)>",];
  "5" [label="<org.example.proxy.ServiceImpl: void <init>()>",];
  "6" [label="<java.lang.Object: void <init>()>",];
  "7" [label="<org.example.Main: void main(java.lang.String[])>",];
  "8" [label="<java.lang.reflect.Proxy: java.lang.Object newProxyInstance(java.lang.ClassLoader,java.lang.Class[],java.lang.reflect.InvocationHandler)>",];
  "2" -> "8" [label="[6@L27] $r6 = invokestatic <java.lang.reflect.Proxy: java.lang.Object newProxyInstance(java.lang.ClassLoader,java.lang.Class[],java.lang.reflect.InvocationHandler)>($r2, $r4, $r5);",];
  "2" -> "4" [label="[5@L29] invokespecial $r5.<org.example.proxy.MyInvocationHandler: void <init>(java.lang.Object)>(r0);",];
  "2" -> "3" [label="[0@L28] $r1 = invokevirtual r0.<java.lang.Object: java.lang.Class getClass()>();",];
  "2" -> "1" [label="[1@L28] $r2 = invokevirtual $r1.<java.lang.Class: java.lang.ClassLoader getClassLoader()>();",];
  "2" -> "0" [label="[3@L29] $r4 = invokevirtual $r3.<java.lang.Class: java.lang.Class[] getInterfaces()>();",];
  "2" -> "3" [label="[2@L29] $r3 = invokevirtual r0.<java.lang.Object: java.lang.Class getClass()>();",];
  "4" -> "6" [label="[0@L12] invokespecial %this.<java.lang.Object: void <init>()>();",];
  "5" -> "6" [label="[0@L3] invokespecial %this.<java.lang.Object: void <init>()>();",];
  "7" -> "2" [label="[2@L11] $r1 = invokestatic <org.example.proxy.MyInvocationHandler: java.lang.Object getProxy(java.lang.Object)>($r0);",];
  "7" -> "5" [label="[1@L10] invokespecial $r0.<org.example.proxy.ServiceImpl: void <init>()>();",];
}

The call edge main ā†’ doSomething is missing.

In the actual runtime call sequence, before doSomething is called, the method invoke of MyInvocationHandler will be called, and then doSomething is called through reflection within the invoke method.

After completing the pointer analysis, I reviewed the results of the analysis.

solver.csManager.callSites includesļ¼š

<org.example.Main: void main(java.lang.String[])>[4@L12] invokeinterface r2.doSomething()

solver.csManager.ptrManager.vars.map includes var r2 ļ¼Œbut the pointsToSet of r2 is null , As shown in Figure-1

At runtime, the type of r2 is jdk.proxy1.$Proxy0

public static void main(String[] args) {
        ServiceImpl service = new ServiceImpl();
        Service proxy = (Service) MyInvocationHandler.getProxy(service);
        System.out.println(proxy.getClass());   //class jdk.proxy1.$Proxy0
        proxy.doSomething();
    }

Since $Proxy0 is generated at runtime, Tai-e is unable to identify the allocation site for this object. So there is no Object mocked, which results in the missing call edge. Is my understanding correct?

According to #114 ļ¼š

Regarding mocking IR, Tai-e currently supports mocking IR within method at the statement level but does not support mocking an entire class. We will take this into consideration in the future.

Does this imply that Tai-e does not yet natively support method calls in dynamic proxy? If Tai-e supports handling method calls within proxy classes, what configurations should I modify?

Moreover, I have observed that solver.csManager.objManager.objMap contains:ļ¼ˆas shown in Figure-2ļ¼‰

{ConstantObj@5877} "ConstantObj{java.lang.Class: org.example.proxy.ServiceImpl.class}" -> {HybridHashMap@5878}  size = 1

Why is org.example.proxy.ServiceImpl.classconsidered a ConstantObj?



Additionally, in tai-e-analyses.yml , I set the value of handle-invokedynamic to true. Tai-e output the IR of $Proxy:

public final class jdk.proxy1.$Proxy0 extends java.lang.reflect.Proxy implements org.example.proxy.Service {

    ...

    public final void doSomething() {
        java.lang.reflect.InvocationHandler $r2;
        java.lang.reflect.Method $r1;
        null-type %nullconst;
        java.lang.Throwable $r5, $r3;
        java.lang.reflect.UndeclaredThrowableException $r4;
        [0@L-1] $r2 = %this.<java.lang.reflect.Proxy: java.lang.reflect.InvocationHandler h>;
        [1@L-1] $r1 = <jdk.proxy1.$Proxy0: java.lang.reflect.Method m3>;
        [2@L-1] invokeinterface $r2.<java.lang.reflect.InvocationHandler: java.lang.Object invoke(java.lang.Object,java.lang.reflect.Method,java.lang.Object[])>(%this, $r1, %nullconst);
        [3@L-1] return;
        [4@L-1] catch $r5;
        [5@L-1] throw $r5;
        [6@L-1] catch $r3;
        [7@L-1] $r4 = new java.lang.reflect.UndeclaredThrowableException;
        [8@L-1] invokespecial $r4.<java.lang.reflect.UndeclaredThrowableException: void <init>(java.lang.Throwable)>($r3);
        [9@L-1] throw $r4;

        try [0, 4), catch java.lang.Error at 4
        try [0, 4), catch java.lang.RuntimeException at 4
        try [0, 4), catch java.lang.Throwable at 6
    }

    ...

}

I have a few questions regarding this IR. Could you explain why the line number is shown as -1?

šŸŽÆ Expected Behavior

None

šŸ› Current Behavior

None

šŸ”„ Reproducible Example

No response

āš™ļø Tai-e Arguments

šŸ” Click here to see Tai-e Options ```yaml optionsFile: null printHelp: false classPath: - ../Tai-e_Test/build/classes/java/main appClassPath: - ../Tai-e_Test/build/classes/java/main mainClass: org.example.Main inputClasses: [] javaVersion: 17 prependJVM: true allowPhantom: true worldBuilderClass: pascal.taie.frontend.soot.SootWorldBuilder outputDir: output preBuildIR: false worldCacheMode: false scope: APP nativeModel: true planFile: null analyses: ir-dumper: "" cg: "" cfg: "" pta: "plugins:[pascal.taie.analysis.pta.plugin.CustomEntryPointPlugin]" onlyGenPlan: false keepResult: - $KEEP-ALL ```
šŸ” Click here to see Tai-e Analysis Plan ```yaml - id: ir-dumper options: {} - id: pta options: cs: 1-obj only-app: true implicit-entries: false distinguish-string-constants: reflection merge-string-objects: true merge-string-builders: true merge-exception-objects: true handle-invokedynamic: true propagate-types: - reference advanced: null dump: false dump-ci: false dump-yaml: false expected-file: null reflection-inference: string-constant reflection-log: null taint-config: null taint-config-providers: [] taint-interactive-mode: false plugins: - pascal.taie.analysis.pta.plugin.CustomEntryPointPlugin time-limit: -1 - id: cg options: algorithm: pta dump: true dump-methods: true dump-call-edges: true - id: throw options: exception: explicit algorithm: intra - id: cfg options: exception: explicit dump: true ```

šŸ“œ Tai-e Log

šŸ” Click here to see Tai-e Log ``` Writing log to /Users/yuntsy/My/Projects/Java/Tai-e/output/tai-e.log java.version: 17.0.11 java.version.date: 2024-04-16 java.runtime.version: 17.0.11+7-LTS-207 java.vendor: Oracle Corporation java.vendor.version: null os.name: Mac OS X os.version: 15.0.1 os.arch: aarch64 Tai-e Version: 0.5.1-SNAPSHOT Tai-e Commit: 46448829b6c19ae414caea7b43bd7fb8792ac0a5 Writing analysis plan to /Users/yuntsy/My/Projects/Java/Tai-e/output/tai-e-plan.yml WorldBuilder starts ... 10085 classes with 99482 methods in the world WorldBuilder finishes, elapsed time: 1.62s ir-dumper starts ... Dumping IR in /Users/yuntsy/My/Projects/Java/Tai-e/output/tir 5 classes in scope (APP) of class analyses ir-dumper finishes, elapsed time: 0.03s pta starts ... [Pointer analysis] elapsed time: 0.01s -------------- Pointer analysis statistics: -------------- #var pointers: 12 (insens) / 12 (sens) #objects: 5 (insens) / 5 (sens) #var points-to: 9 (insens) / 9 (sens) #static field points-to: 0 (sens) #instance field points-to: 1 (sens) #array points-to: 1 (sens) #reachable methods: 9 (insens) / 10 (sens) #call graph edges: 10 (insens) / 10 (sens) ---------------------------------------- pta finishes, elapsed time: 0.11s cg starts ... Call graph has 9 reachable methods and 10 edges Dumping call graph to /Users/yuntsy/My/Projects/Java/Tai-e/output/call-graph.dot Dumping reachable methods to /Users/yuntsy/My/Projects/Java/Tai-e/output/reachable-methods.txt Dumping call edges to /Users/yuntsy/My/Projects/Java/Tai-e/output/call-edges.txt cg finishes, elapsed time: 0.01s throw starts ... 14 methods in scope (APP) of method analyses throw finishes, elapsed time: 0.00s cfg starts ... Dumping CFGs in /Users/yuntsy/My/Projects/Java/Tai-e/output/cfg cfg finishes, elapsed time: 0.01s Tai-e finishes, elapsed time: 1.88s ```

ā„¹ļø Additional Information

No response

zhangt2333 commented 1 week ago

Thank you for taking the time to provide such detailed information. This seems to be a rather important issue, we'll take the time to look into it after being free.

Before we investigate this issue further, we would like to conduct a user study to understand your experience with our GitHub Issue Template. Specifically, we want to determine if there are any organizational, descriptive or structural aspects of the template that make it difficult/undesirable for you to follow when submitting an issue.

YunFy26 commented 1 week ago

I apologize for not strictly adhering to the issue template format when submitting my issue. Iā€™d like to explain the reason behind this.

When describing my example in the Overall Description, whether itā€™s for this issue or previous ones, I find it difficult to separate the Expected Behavior and Current Behavior from the Overall Description. When describing the issue, I always feel that placing Expected Behavior and Current Behavior as separate headings after the Overall Description creates a sense of ā€œdisconnection.ā€ It feels like it disrupts the flow of the explanation.

Taking this submission as an example, I want to analyze the function calls related to dynamic proxies. I first provided a brief description in the title: ā€œcall edges related to dynamic proxy are missing.ā€ Then, in the Overall Description, I started by offering a demo as a sample for analysis.

ā‘ Demo

Afterward, I presented the resulting call graph and explained the outcome of this analysis.

ā‘”The call edgeĀ mainĀ ā†’Ā doSomethingĀ is missing.

Next, I described the actual runtime call sequence:

ā‘¢In the actual runtime call sequence, beforeĀ doSomethingĀ is called, the methodĀ invokeĀ ofĀ MyInvocationHandlerĀ will be called, and thenĀ doSomethingĀ is called through reflection within theĀ invokeĀ method.

In this process:

ā‘  is the Reproducible Example

ā‘” is the Current Behavior

ā‘¢ is the Expected Behavior(Perhaps I didnā€™t describe it clearly enough. I should have included a call chain like: main -> invoke -> doSomething as Expected Behavior.)

If I strictly followed the template, the structure would probably look like this: I would first describe the issue in the Overall Description, then follow with either a ā‘¢ā‘”ā‘  or ā‘”ā‘¢ā‘  format.

Personally, I believe that describing the entire process directly in the Overall Description makes it easier to follow and understand. Therefore, I placed everything in the Description section. In this case, if I were to follow the template strictly, it would result in redundant content. Thatā€™s why I filled in ā€œNoneā€ for both Expected Behavior and Current Behavior.

In fact, to ensure that others could understand more easily, I revised the content and format multiple times before submitting. (However, looking at it again now, it seems I should have used symbols like ā€œĀ·ā€ or ā€œ>ā€ to better organize the structure.)

Regarding the issue template, I personally believe that Expected Behavior and Current Behavior could be subheadings under the Overall Description, but this is just my personal opinion. You may want to gather feedback from other users to make a more informed decision.

BryanHeBY commented 1 day ago

Hi YunFy26, I set the value of handle-invokedynamic to true, but I still can't find the IR for $Proxy. Could you please provide me with an environment where this IR output can be reproduced, including the JDK environment, tai-e configuration options, etc.? I noticed that you enabled a custom plugin, pascal.taie.analysis.pta.plugin.CustomEntryPointPlugin. Would this plugin affect the result?

As for the question, Why is org.example.proxy.ServiceImpl.class considered a ConstantObj?, it's because it is the class object (of java.lang.Class type) literal, not the class itself.

YunFy26 commented 23 hours ago

@BryanHeBY Apologies for mistakenly assuming that the value of handle-invokedynamic affected the IR output of $Proxy0.

In this repo, after running ./gradlew build, I navigated to build/classes/java/main and executed:

java -Djdk.proxy.ProxyGenerator.saveGeneratedFiles=true -cp . org.example.Main

This caused the bytecode file of the dynamic proxy class to be saved in build/classes/java/main/jdk/proxy1/$Proxy0.class, leading it to be recognized as an application class and subsequently loaded into Tai-e World. As a result, when executing ir-dumper, the IR for $Proxy0 is output as well.

This is unrelated to the missing call edges in method invocations within dynamic proxy classes.

I apologize for my limited expertise, which may have caused inconvenience to the Tai-e team members. I also sincerely appreciate the Tai-e team for addressing my questions.