Closed fangyuan00 closed 10 months ago
The call graph generates six callees of the function StandardPipeline:void setBasic(org.apache.catalina.Valve). However, the first five generated callees are all from interfaces, not the actual running classes. I am interested in understanding how to configure Tai-e to accurately capture the actual calls. Is this achievable through static analysis?
Because of the lack of an reproducible example (see an example of how to write a reproducible case), I cannot reproduce your problem.
I tried our case Dispatch
, it outputed as expected.
Additionally, I wonder the differences between the function CallGraph.callSitesIn(..) and the function CallGraph.getCalleesOfM(..) in Tai-e. Why is the result of the function CallGraph.getCalleesOfM(..) missing three callees?
Please check carefully the return value types of the two methods, they are different.
@zhangt2333 Hi, apologies for not providing a reproducible example previously. I have now included it.
My test code is as follows.
package org.example;
import pascal.taie.analysis.graph.callgraph.CallGraph;
import pascal.taie.analysis.graph.callgraph.CallGraphBuilder;
import pascal.taie.config.AnalysisConfig;
import pascal.taie.ir.stmt.Invoke;
import pascal.taie.language.classes.JMethod;
import java.util.*;
public class App {
static LinkedList<String> excludeList;
public static LinkedList<String> excludeList() {
if(excludeList==null) {
excludeList = new LinkedList<String> ();
excludeList.add("java.");
excludeList.add("javax.");
excludeList.add("sun.");
excludeList.add("sunw.");
excludeList.add("com.sun.");
excludeList.add("com.ibm.");
excludeList.add("com.apple.");
excludeList.add("apple.awt.");
}
return excludeList;
}
public App(String... options){
pascal.taie.Main.main(options);
}
public String processMethodName(String methodName){
if (methodName.startsWith("<"))
methodName = methodName.substring(1);
if (methodName.endsWith(">"))
methodName = methodName.substring(0, methodName.length()-1);
String[] tmp = methodName.split(":");
return tmp[0].strip()+":"+tmp[1].strip();
}
public CallGraph<Invoke, JMethod> useCGanalysis(){
AnalysisConfig Config = AnalysisConfig.of("cg","dump", true, "dump-methods", true,
"dump-call-edges", true, "algorithm", "pta");
CallGraphBuilder builder = new CallGraphBuilder(Config);
CallGraph<Invoke, JMethod> res = builder.analyze();
return res;
}
public void testCallGraph(){
CallGraph<Invoke, JMethod> callgraph = this.useCGanalysis();
Queue<JMethod> queue = new LinkedList<>();
HashSet<JMethod> visited = new HashSet<>();
for (JMethod entry: callgraph.entryMethods().toList()){
boolean isIgnore = false;
for (String exclude: excludeList()){
if (entry.getDeclaringClass().toString().startsWith(exclude)) {
isIgnore = true;
break;
}
}
if (isIgnore)
continue;
queue.add(entry);
}
while (queue.size() > 0){
JMethod caller = queue.poll();
String callerName = this.processMethodName(caller.getSignature());
visited.add(caller);
List<JMethod> callees = callgraph.getCalleesOfM(caller).stream().toList();
if (callerName.equals("org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)")){
System.out.println("Reaching the target function.");
List<Invoke> calleeInvokes = callgraph.callSitesIn(caller).toList();
System.out.println("****** Invoke ******");
System.out.println("[caller]: "+callerName);
for (Invoke invoke: calleeInvokes){
String calleeName = this.processMethodName(invoke.getInvokeExp().getMethodRef().toString());
System.out.println("callee: "+calleeName);
}
System.out.println("********************");
System.out.println("****** JMethod *****");
System.out.println("[caller]: "+callerName);
for (JMethod callee: callees){
String calleeName = this.processMethodName(callee.getSignature());
System.out.println("callee: "+calleeName);
}
System.out.println("********************");
break;
}
for (JMethod callee: callees){
boolean isIgnore = false;
for (String exclude: excludeList()){
if (callee.getDeclaringClass().toString().startsWith(exclude)) {
isIgnore = true;
break;
}
}
if (isIgnore)
continue;
if (!visited.contains(callee))
queue.add(callee);
}
}
}
public static void main(String[] args){
App main = new App("-acp", "src/main/resources/catalina_6.0.33",
"-java", "8", "-ap",
"-a", "pta=cs:ci;only-app:true;implicit-entries:false",
"-a", "cg=algorithm:pta;dump:true;dump-methods:true;dump-call-edges:true",
"-scope", "REACHABLE");
main.testCallGraph();
}
}
Due to the necessity of generating a comprehensive call graph, I added multiple entry points and excluded functions from basic Java classes. As a result, I made minor modifications to Tai-e.
Tai-e/src/main/java/pascal/taie/analysis/pta/plugin/EntryPointHandler.java
public void onStart() {
// process program main method
JMethod main = World.get().getMainMethod();
if (main != null) {
solver.addEntryPoint(new EntryPoint(main,
new DeclaredParamProvider(main, solver.getHeapModel(), 1)));
}
// process implicit entries
if (solver.getOptions().getBoolean("implicit-entries")) {
for (JMethod entry : World.get().getImplicitEntries()) {
solver.addEntryPoint(new EntryPoint(entry, EmptyParamProvider.get()));
}
}
// fy: add all entries
HeapModel heapModel = solver.getHeapModel();
//get all application classes
Stream<JClass> appClasses = World.get().getClassHierarchy().applicationClasses();
Iterator<JClass> itr = appClasses.iterator();
Collection<JMethod> entryPoints = new ArrayList<JMethod>();
while (itr.hasNext()) {
JClass appClass = itr.next();
if (appClass.isInterface())
continue;
if (appClass.isAbstract())
continue;
Collection<JMethod> methods = appClass.getDeclaredMethods();
entryPoints.addAll(methods);
}
for (JMethod entryPoint : entryPoints) {
Type appType = entryPoint.getDeclaringClass().getType();
List<String> appTmp = Arrays.asList(appType.toString().split("\\."));
String appAlloc = "<" + appTmp.get(appTmp.size() - 1) + ">";
Obj thisObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, appAlloc, appType);
List<Type> paramTypes = entryPoint.getParamTypes();
List<Obj> mockObjs = new ArrayList<Obj>();
for (Type type : paramTypes) {
List<String> tmp = Arrays.asList(type.toString().split("\\."));
String alloc = "<" + tmp.get(tmp.size() - 1) + ">";
Obj mockObj = heapModel.getMockObj(Descriptor.ENTRY_DESC, alloc, type, entryPoint);
mockObjs.add(mockObj);
}
SpecifiedParamProvider.Builder builder = new SpecifiedParamProvider.Builder(entryPoint)
.addThisObj(thisObj);
for (int i = 0; i < mockObjs.size(); i++) {
builder.addParamObj(i, mockObjs.get(i));
}
SpecifiedParamProvider paramProvider = builder.build();
solver.addEntryPoint(new EntryPoint(entryPoint, paramProvider));
}
}
Target function: org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)
The different callee results between callgraph.callSitesIn(caller)
and callgraph.getCalleesOfM(caller)
:
Reaching the target function.
****** Invoke ****** (`callgraph.callSitesIn`)
[caller]: org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)
callee: org.apache.catalina.Lifecycle:void stop()
callee: org.apache.juli.logging.Log:void error(java.lang.Object,java.lang.Throwable)
callee: org.apache.catalina.Contained:void setContainer(org.apache.catalina.Container)
callee: org.apache.catalina.Contained:void setContainer(org.apache.catalina.Container)
callee: org.apache.catalina.Lifecycle:void start()
callee: org.apache.juli.logging.Log:void error(java.lang.Object,java.lang.Throwable)
callee: org.apache.catalina.Valve:org.apache.catalina.Valve getNext()
callee: org.apache.catalina.Valve:void setNext(org.apache.catalina.Valve)
callee: org.apache.catalina.Valve:org.apache.catalina.Valve getNext()
********************
****** JMethod ***** (`callgraph.getCalleesOfM`)
[caller]: org.apache.catalina.core.StandardPipeline:void setBasic(org.apache.catalina.Valve)
callee: org.apache.catalina.valves.ValveBase:org.apache.catalina.Valve getNext()
callee: org.apache.catalina.valves.ValveBase:void setContainer(org.apache.catalina.Container)
callee: org.apache.catalina.valves.ValveBase:void setNext(org.apache.catalina.Valve)
callee: org.apache.catalina.core.StandardContextValve:void setContainer(org.apache.catalina.Container)
********************
Besides, the callees in the testTaieCG/output/call-edges.txt
are the same as the result of callgraph.getCalleesOfM(caller)
, which missed the callees Lifecycle: void start()
, Lifecycle: void stop()
, and Log: void error(..)
.
Although the result of callee.callSitesIn(caller)
is complete and comprehensive, but the callees' classes are interfaces (Interfaces: Contained, Lifecycle, Valve
) not the actual run subclasses.
The test package is org.apache.tomcat:catalina:6.0.33
catalina_6.0.33.zip
I am looking forward to your reply. Thank you so much!
Thank you for extracting and editing and sending this detailed information! ❤️
Maybe it's more convenient for you to just package the code and upload it (in the Issue Comment Editor, or more professionally, in the GitHub Repo)~
I'll take a look.
@zhangt2333
Thank you for your reminding.
The test code repository is https://github.com/fangyuan00/testTaieCG.
The test package located on src/main/resources/catalina_6.0.33
.
The compiled Tai-e library is on lib/
.
That's really helpful!! It brings me immense power and joy to contribute to the open-source with ❤️, especially working to address problems for individuals like yourself.
Besides, the callees in the
testTaieCG/output/call-edges.txt
are the same as the result ofcallgraph.getCalleesOfM(caller)
, which missed the calleesLifecycle: void start()
,Lifecycle: void stop()
, andLog: void error(..)
. Although the result ofcallee.callSitesIn(caller)
is complete and comprehensive, but the callees' classes are interfaces (Interfaces: Contained, Lifecycle, Valve
) not the actual run subclasses.
The essence of pascal.taie.ir.stmt.Invoke
.
An pascal.taie.ir.stmt.Invoke
, which is one of the IR elements in Tai-e, only represent the infomation in the code level but not the running/analysis. So invoke.getInvokeExp().getMethodRef()
will only get that it's calling a method represented in the code level, e.g., Lifecycle: void start()
.
How to get the actual callee?
Invoke
will be dispatched to its real callee(s) based on its base/receiver object during pointer analysis.
Just need to make a few changes to your code:
public static void main(String[] args){
App main = ...
...
+ World world = World.get();
+ PointerAnalysisResult ptaResult = world.getResult(PointerAnalysis.ID);
+ CallGraph<Invoke, JMethod> cg = world.getResult(CallGraphBuilder.ID);
+ JMethod method = world.getClassHierarchy().getMethod(
+ "<org.apache.catalina.core.StandardPipeline: void setBasic(org.apache.catalina.Valve)>");
+ cg.getCallSitesIn(method).forEach(invoke -> {
+ Set<JMethod> callees = cg.getCalleesOf(invoke);
+ System.out.println(invoke + "\n calls to " + callees);
+ if (invoke.getInvokeExp() instanceof InvokeInstanceExp invokeInstanceExp) {
+ Var base = invokeInstanceExp.getBase();
+ Set<Obj> pointsToSet = ptaResult.getPointsToSet(base);
+ System.out.println(" based on " + base + " pointing to " + pointsToSet);
+ }
+ System.out.println();
});
Then it outputed:
<org.apache.catalina.core.StandardPipeline: void setBasic(org.apache.catalina.Valve)>[10@L369] invokeinterface $r8.stop()
calls to []
based on $r8 pointing to []
...
<org.apache.catalina.core.StandardPipeline: void setBasic(org.apache.catalina.Valve)>[19@L376] invokeinterface $r7.setContainer(%nullconst)
calls to [<org.apache.catalina.valves.ValveBase: void setContainer(org.apache.catalina.Container)>, <org.apache.catalina.core.StandardContextValve: void setContainer(org.apache.catalina.Container)>]
based on $r7 pointing to [NewObj{<org.apache.catalina.core.StandardEngine: void <init>()>[11@L79] new org.apache.catalina.core.StandardEngineValve}, NewObj{<org.apache.catalina.core.StandardContext: void <init>()>[171@L132] new org.apache.catalina.core.StandardContextValve}, NewObj{<org.apache.catalina.core.StandardHost: void <init>()>[31@L73] new org.apache.catalina.core.StandardHostValve}]
...
Because the base/receiver object $r8
of invokeinterface $r8.stop()
pointed to nothing, it invoked nothing.
Thank you for resolving the problems that puzzled me for two days!!
Description
Hello, I would like to build a call graph using Tai-e's context-insensitive pointer analysis. However, I observed that some interface functions of the actual running class are not displayed in the call graph.
Tai-e Configuration
The Decompiled Function Source Code
Interface:
Lifecycle
,Contained
andValve
..../catalina_6.0.33/org/apache/catalina/core/StandardPipeline.class
The Generated Call Graph by PTA algorithm
Question
The call graph generates six callees of the function
StandardPipeline:void setBasic(org.apache.catalina.Valve)
. However, the first five generated callees are all from interfaces, not the actual running classes. I am interested in understanding how to configure Tai-e to accurately capture the actual calls. Is this achievable through static analysis?Additionally, when I perform call graph analysis using Soot's RTA algorithm, I obtain functions from all subclasses related to the interfaces. However, this approach introduces a significant number of false positives, which is not desirable for my purposes.
Environment
Additionally, I wonder the differences between the function
CallGraph.callSitesIn(..)
and the functionCallGraph.getCalleesOfM(..)
in Tai-e. Why is the result of the functionCallGraph.getCalleesOfM(..)
missing three callees?I would greatly appreciate your assistance. Thank you so much.