rohanpadhye / vasco

An inter-procedural data-flow analysis framework using value-based context sensitivity
GNU Lesser General Public License v2.1
88 stars 35 forks source link

Context unique for statements #17

Closed rareham closed 2 years ago

rareham commented 3 years ago

I am using vasco for inter procedural analysis. I see the that the context is getting stored at each unit for inter procedural analysis. How do you make sure that unit object type is unique for each statement. I don't see soot guaranteeing that in the implementation.

rohanpadhye commented 3 years ago

Hi! That's a good question. This is not as much a property of Units as it is a property of control-flow graphs. VASO's program representation uses the Soot DirectedGraph to represent CFGs. This API does indeed map node objects to predecessors and successors.

Let me know if you encounter a case where this is not true.

rareham commented 3 years ago

if the analysis data flow values are stored in map at each node due to an instance of unit graph. If i create another instance of unit graph will i be able to get the data flow values with the keys as nodes from the new unit graph. It is not understood how soot implements hashcode for graph nodes or how the map generates the hashcode for the node when put into a map as a key.

rohanpadhye commented 3 years ago

It does not look like Soot's Unit class defines its own hashCode/equals methods, so it looks like they are inherited from java.lang.Object. So it appears that Soot/Vasco relies on object identity for distinguishing objects in a control-flow graph.

rareham commented 3 years ago

Object identity you mean the object reference(heap location) or type.

rareham commented 3 years ago

I think the unit objects for a method are created only once because the jimple phase of soot runs once to get all the method bodies and create units. Hence every time i create unit graph on the method body i get the same instance of the statement object and hence would be hashing into the same object. Assume all the unit objects are unique to start with.

rohanpadhye commented 3 years ago

Yes, that's right. In Vasco, this shouldn't matter as the data-flow value maps are associated with contexts, which only map nodes for a CFG. So even if you have say two contexts X1 and X2 for the same method say M, they both will have CFGs that share the same nodes (same Unit objects in memory), but there are no duplicate nodes within X1 or within X2. Since the hash maps are per-context, this is not a problem.

If you have a global hash map over units and have multiple CFGs for the same method, then yes you will have problems with collisions. In that case, you might want to use not just Unit objects but a pair of <Context, Unit> as a way to uniquely identify nodes.

Hope this helps.

rareham commented 3 years ago

yes this helps thank you for your time and replies.