pascal-lab / Tai-e

An easy-to-learn/use static analysis framework for Java
https://tai-e.pascal-lab.net/docs/index.html
GNU Lesser General Public License v3.0
1.45k stars 175 forks source link

How to get object names #126

Open aascorreia opened 3 days ago

aascorreia commented 3 days ago

๐Ÿ“ Overall Description

Hello!

I'm currently performing Tai-e's PTA over a test class Counter with the following options:

-pp -cp bin -m classes.Counter -a pta=cs:1-call;implicit-entries:false;only-app:true;distinguish-string-constants:reflection;time-limit:-1;

The class itself looks like this:

public class Counter {
    public static void main(String... args) {
        Counter c1 = new Counter();
        Counter c2 = new Counter();
        increment(c1);
        increment(c2);
    }
    private int counter;
    static void increment(Counter c) { c.counter++; }
}

The goal in mind is to gather information regarding the objects invoking the various class methods, identifying the fields involved in read and/or write operations.

By iterating through LoadField and StoreField statements for each variable provided by PointerAnalysisResultImpl.getVars(), I am able to access the field references that are subject to read and write operations, respectively. This information is then used to populate a map whose keys correspond to said field references' names, and values store objects representing the variable's access information (method name and access type).

When running Tai-e's PTA over Counter, I get the following information:

{counter=[increment{READ}, increment{WRITE}, increment{READ}, increment{WRITE}]}

Since both c1 and c2 call increment, it makes sense that two instances of read and write operations are captured. The issue is that it would be ideal to separate these two instances into distinct map keys such that:

{c1.counter=[increment{READ}, increment{WRITE}], c2.counter=[increment{READ}, increment{WRITE}]}

While it is possible to obtain the points-to set of a given variable using PointerAnalysisResultImpl.getPointsToSet(Var), I cannot seem to find a way to "resolve" the retrieved Obj objects (example shown below) to get the corresponding object names that are present in the code (c1 and c2 in this case).

NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter}
NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter}

Is it feasible to obtain this information, or does the framework not allow it? Should I be using other analysis options or plugins?

Additionally, if Counter's field was instead a reference to another class which contains the counter that is modified in increment, would I have to call getPointsToSet recursively to get to c1.ref.counter, for example?

Thank you for your time.

๐ŸŽฏ Expected Behavior

Printing my map should output to the console:

{c1.counter=[increment{READ}, increment{WRITE}], c2.counter=[increment{READ}, increment{WRITE}]}

๐Ÿ› Current Behavior

Because I cannot obtain the actual object names currently, the console displays:

{counter=[increment{READ}, increment{WRITE}, increment{READ}, increment{WRITE}]}

๐Ÿ”„ Reproducible Example

No response

โš™๏ธ Tai-e Arguments

No response

๐Ÿ“œ Tai-e Log

No response

โ„น๏ธ Additional Information

No response

jjppp commented 2 days ago

Hi @aascorreia.

Conceptually, method increment performs READ and WRITE operations on objects rather than variables. Here c1 and c2 are variables, while NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter} and NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter} are objects. Since objects can be passed around and assigned to variables with different names, the two Counter objects's name aren't really c1 and c2.

Tai-e prefix abstract objects with their allocation site (i.e., the method where allocations happen). As you can see, those two Counter objects are allocated in the method void main(String[]).

A possible solution to your problem would be roughly like this:

  1. perform pointer analysis for the program to be analyzed
  2. for each Load instruction x = y.f in method increment, record a READ operation on objects in pts(y) (using possibly a map with Obj as its key), where pts(y) refers to the points-to-set of the variable y.
  3. for each Store instruction x.f = y, record a WRITE operation on objects in pts(x).
  4. for each variable of interests (e.g., c1 and c2 in method void main(String[])), scan the points-to-set of the variable, lookup what operations are performed on it, and store those operations back to a map whose keys are variables.
aascorreia commented 1 day ago

Thank you for shedding light on the difference between objects and variables. I did not consider that aspect of objects at first and can now understand why Obj does not store the name of either c1 or c2.

However, I am a bit confused with Step 4. For reference, here is the code that is being executed after analysis is done, as I believe Steps 2 and 3 have already been accomplished to some extent.

            PointerAnalysisResultImpl result = World.get().getResult("pta");
            Collection<CSVar> csVars = result.getCSVars();
            FieldAccessMap ptaInfo = new FieldAccessMap();
            if (!csVars.isEmpty())
                for (CSVar var : csVars) {
                    for (Obj obj : result.getPointsToSet(var.getVar()))
                        System.out.println(var.getVar().getName() + "=> " + obj);
                    System.out.println("-".repeat(100));

                    if (!var.getVar().getLoadFields().isEmpty())
                        for (LoadField lField : var.getVar().getLoadFields())
                            ptaInfo.recordAccess(
                                    lField.getFieldAccess().getFieldRef().getName(),
                                    var.getVar().getMethod().getName(),
                                    AccessType.READ
                            );

                    if (!var.getVar().getStoreFields().isEmpty())
                        for (StoreField sField : var.getVar().getStoreFields())
                            if (!sField.getRValue().getMethod().getSignature().contains("<init>"))
                                ptaInfo.recordAccess(
                                        sField.getFieldAccess().getFieldRef().getName(),
                                        var.getVar().getMethod().getName(),
                                        AccessType.WRITE
                                );
                }
            ptaInfo.printAccessMap();

FieldAccessMap is what holds the map I initially mentioned.

jjppp commented 1 day ago

I believe this is what you want.

if (!var.getVar().getLoadFields().isEmpty()) {
    for (LoadField lField : var.getVar().getLoadFields()) {
        for (Obj obj : result.getPointsToSet(var.getVar())) {
            ptaInfo.recordAccess(
                    obj + lField.getFieldAccess().getFieldRef().getName(),
                    var.getVar().getMethod().getName(),
                    AccessType.READ
            );
        }
    }
}

if (!var.getVar().getStoreFields().isEmpty()) {
    for (StoreField sField : var.getVar().getStoreFields()) {
        if (!sField.getRValue().getMethod().getSignature().contains("<init>")) {
            for (Obj obj : result.getPointsToSet(var.getVar())) {
                ptaInfo.recordAccess(
                        obj + sField.getFieldAccess().getFieldRef().getName(),
                        var.getVar().getMethod().getName(),
                        AccessType.WRITE
                );
            }
        }
    }
}

and the output will be something like this

NewObj{<Counter: void main(java.lang.String[])>[2@L4] new Counter}counter: [<increment, READ>, <increment, WRITE>, <increment, READ>, <increment, WRITE>]
NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}counter: [<increment, READ>, <increment, WRITE>, <increment, READ>, <increment, WRITE>]
aascorreia commented 1 day ago

I see. Given your answer though, I'm assuming there really is no way to know the name of a variable since, from what I understood, var.getVar().getName() retrieves a reference to memory rather than an actual name, and NewObj is referencing the object itself.

I was hoping that, even if Tai-e cannot provide that bit of information (c1 and c2 as names), it would be possible to implement a new plugin, or use an existing one, for that effect.

ayanamists commented 12 hours ago

I haven't fully understood what you are discussing.

var.getVar().getName() retrieves a reference to memory rather than an actual name,

This is incorrect. If var is a CSVar object, then var.getVar() returns a Var object. The CSVar simplely means 'a Var with a Context'. Naturally, var.getVar().getName() retrieves the name of the variable.

In fact, the code snippet below is already capable of outputting results with variable names:

PointerAnalysisResultImpl result = World.get().getResult("pta");
Collection<CSVar> csVars = result.getCSVars();
if (!csVars.isEmpty())
    for (CSVar var : csVars) {
        for (Obj obj : result.getPointsToSet(var.getVar()))
            System.out.println(var.getVar().getName() + " => " + obj);
        System.out.println("-".repeat(100));
    }

The execution result on my local environment is:

temp$1 => NewObj{<Counter: void main(java.lang.String[])>[3@L4] new Counter}
----------------------------------------------------------------------------------------------------
%this => NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}
%this => NewObj{<Counter: void main(java.lang.String[])>[3@L4] new Counter}
----------------------------------------------------------------------------------------------------
%this => NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}
%this => NewObj{<Counter: void main(java.lang.String[])>[3@L4] new Counter}
----------------------------------------------------------------------------------------------------
c1 => NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}
----------------------------------------------------------------------------------------------------
args => EntryPointObj{alloc=MethodParam{<Counter: void main(java.lang.String[])>/0},type=java.lang.String[] in <Counter: void main(java.lang.String[])>}
----------------------------------------------------------------------------------------------------
c => NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}
c => NewObj{<Counter: void main(java.lang.String[])>[3@L4] new Counter}
----------------------------------------------------------------------------------------------------
c => NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}
c => NewObj{<Counter: void main(java.lang.String[])>[3@L4] new Counter}
----------------------------------------------------------------------------------------------------
temp$0 => NewObj{<Counter: void main(java.lang.String[])>[0@L3] new Counter}
----------------------------------------------------------------------------------------------------
c2 => NewObj{<Counter: void main(java.lang.String[])>[3@L4] new Counter}

Additional note: It seems you are using .class files as input for Tai-e. If you want to see variable names from the source code, don't forget to add the -g parameter in your compilation command.

aascorreia commented 9 hours ago

I can assure you that I had added the -g parameter to the compilation command, as this is the script I am using for compilation, before performing any analysis:

cd src
javac -g -d ../bin -cp ../bin/tai-e-all-0.5.1-SNAPSHOT.jar classes/*.java *.java
cd ../

Even the IDE is correctly setup to generate debugging information during compilation.

The results I was getting when providing .class files as input, even after verifying that -g is present, is the following:

r2 => EntryPointObj{alloc=MethodParam{<classes.Counter: void main(java.lang.String[])>/0},type=java.lang.String[] in <classes.Counter: void main(java.lang.String[])>}
----------------------------------------------------------------------------------------------------
%this => NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter}
%this => NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter}
----------------------------------------------------------------------------------------------------
%this => NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter}
%this => NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter}
----------------------------------------------------------------------------------------------------
r0 => NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter}
r0 => NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter}
----------------------------------------------------------------------------------------------------
r0 => NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter}
r0 => NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter}
----------------------------------------------------------------------------------------------------
$r0 => NewObj{<classes.Counter: void main(java.lang.String[])>[0@L5] new classes.Counter}
----------------------------------------------------------------------------------------------------
$r1 => NewObj{<classes.Counter: void main(java.lang.String[])>[2@L6] new classes.Counter}

Now, given your comment:

It seems you are using `.class files as input for Tai-e.

I attempted to use src as the classpath for analysis rather than bin, and managed to get output similar to yours. Thank you!