wala / ML

Eclipse Public License 2.0
23 stars 17 forks source link

Inheritance not supported #107

Open khatchad opened 7 months ago

khatchad commented 7 months ago

Consider the following example:

class D:
    pass

class C(D):

    def func(self, x):
        return x * x

c = C()
a = c.func(5)
assert a == 25

In the call graph, the callees of the outer script are the following:

Node: synthetic < PythonLoader, Lscript myscript.py/C, do()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@100 ]
Node: synthetic < PythonLoader, L$script myscript.py/C/func, trampoline2()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@103 ]

When we dump the call graph, we see:

callees of node Lscript myscript.py : [C, trampoline2]

All good. Now, considering the following example:

class D:

    def func(self, x):
        return x * x

class C(D):
    pass

c = C()
a = c.func(5)
assert a == 25

In the call graph, the callees of the outer script are the following:

Node: synthetic < PythonLoader, Lscript myscript.py/C, do()LRoot; > Context: CallStringContext: [ script myscript.py.do()LRoot;@100 ]

When we dump the call graph, we see:

callees of node Lscript myscript.py : [C]
khatchad commented 7 months ago

Hey @msridhar. How is this supposed to work in Java/Javascript in WALA? Suppose we had the following in contrast to the above examples, where a CHA analysis would work well:

class D:

    def func(self, x):
        return x * x

class C(D):

    def func(self, x):
        return x * x

c = C()
a = c.func(5) // LINE X
assert a == 25

I believe that CHA would give us edges at line X to both D.func() and C.func(), but something more advanced (like RTA) would perhaps give us only C.func(). Would we need different kinds of call graphs for something like this? I believe there is only one Python call graph now. Thanks!

msridhar commented 7 months ago

Modeling of inheritance in Java vs. JavaScript are very different. Java has class-based inheritance. The inheritance is represented in the ClassHierarchy, and within call graph construction, there is logic to determine which implementation a method call will dispatch to once the concrete type of the receiver is known. That logic is roughly here (buried rather deeply I'm afraid):

https://github.com/wala/WALA/blob/a8e0f966d58ede90997ef0fae5b69638d61abb8e/core/src/main/java/com/ibm/wala/ipa/callgraph/propagation/SSAPropagationCallGraphBuilder.java#L1794-L1805

For JavaScript we model prototype inheritance but that is quite different. We have IR instructions that model looking up a property in the prototype chain anytime it is read, which handles cases analogous to the one above.

For the Python implementation I'm not 100% sure what is the role of the ClassHierarchy. That would be the natural place to model this kind of thing. For your broken example, the idea would be that at c.func(5), once the concrete type C of the receiver object is known, there would be some logic to walk up the inheritance chain from C until an implementation of func is found, in this case in D, and that would be the resolved target of the call.