wala / ML

Eclipse Public License 2.0
25 stars 17 forks source link

`__call__` not supported #24

Closed khatchad closed 9 months ago

khatchad commented 1 year ago

__call__ doesn't show up in the call graph.

khatchad commented 10 months ago

Suppose we have this input code stored in A.py:

import tensorflow as tf

# Create an override model to classify pictures
class SequentialModel(tf.keras.Model):

  def __init__(self, **kwargs):
    super(SequentialModel, self).__init__(**kwargs)

    self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))

    # Add a lot of small layers
    num_layers = 100
    self.my_layers = [tf.keras.layers.Dense(64, activation="relu")
                      for n in range(num_layers)]

    self.dropout = tf.keras.layers.Dropout(0.2)
    self.dense_2 = tf.keras.layers.Dense(10)

  def __call__(self, x):
    x = self.flatten(x)

    for layer in self.my_layers:
      x = layer(x)

    x = self.dropout(x)
    x = self.dense_2(x)

    return x

if __name__ == '__main__':
    input_data = tf.random.uniform([20, 28, 28])
    print("Input:")
    print(type(input_data))
    print(input_data)

    model = SequentialModel()
    result = model(input_data)

    print("Output:")
    print(type(input_data))
    print(result)

The problematic expression above is result = model(input_data), because that implicitly invokes SequentialModel.__call__. When building a call graph, I am not even seeing a node for __call__, which is strange. Perhaps it only gets created if there is a call to the function? I would expect to see a method reference of < PythonLoader, Lscript A.py/SequentialModel/__call__, do()LRoot; > somewhere in the following CG nodes:

Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V > Context: Everywhere
Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere
Node: <Code body of function Lscript A.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script A.py.do()LRoot;@88 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@114 ]
Node: synthetic < PythonLoader, Lscript A.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@117 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@122 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__> Context: CallStringContext: [ script A.py.SequentialModel.do()LRoot;@12 ]
Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@5 ]]
Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@25 ]
Node: synthetic < PythonLoader, LCodeBody, __Lscript A.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@26 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript A.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@111 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]
khatchad commented 10 months ago

If I change the input file to use result = model.__call__(input_data) instead, I am seeing these nodes:

Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V > Context: Everywhere
Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere
Node: <Code body of function Lscript A.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script A.py.do()LRoot;@88 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@114 ]
Node: synthetic < PythonLoader, Lscript A.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@117 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@123 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__> Context: CallStringContext: [ script A.py.SequentialModel.do()LRoot;@12 ]
Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@5 ]]
Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@25 ]
Node: synthetic < PythonLoader, LCodeBody, __Lscript A.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@26 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript A.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@111 ]
Node: synthetic < PythonLoader, L$script A.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@120 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]
Node: <Code body of function Lscript A.py/SequentialModel/__call__> Context: CallStringContext: [ $script A.py.SequentialModel.__call__.trampoline2()LRoot;@2 ]
khatchad commented 10 months ago

And, the diff between the two:

7c7
< Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@122 ]
---
> Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@123 ]
13a14
> Node: synthetic < PythonLoader, L$script A.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@120 ]
14a16
> Node: <Code body of function Lscript A.py/SequentialModel/__call__> Context: CallStringContext: [ $script A.py.SequentialModel.__call__.trampoline2()LRoot;@2 ]

The first difference just looks like a difference is the IR with the values, probably because there's one more value corresponding to the "new" function invocation.

khatchad commented 10 months ago

I would think that __call__ needs to be added along side this code. It's a built-in function?

https://github.com/wala/ML/blob/1b1ffac127c0c8f48a11d2c661b71450c9d60ce9/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/summaries/BuiltinFunctions.java#L288

khatchad commented 9 months ago

I would think that __call__ needs to be added along side this code. It's a built-in function?

https://github.com/wala/ML/blob/1b1ffac127c0c8f48a11d2c661b71450c9d60ce9/com.ibm.wala.cast.python/source/com/ibm/wala/cast/python/ipa/summaries/BuiltinFunctions.java#L288

I don't think so. Specifically, __init__ isn't listed as one.

khatchad commented 9 months ago

I was able to switch the target to the correct IMethod, however, the points-to analysis is wrong. By switching the receiver in the target selector, I was able to get a node for the __call__() trampoline:

callees of node trampoline2 : []

IR of node 11, context CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@113 ]
synthetic < PythonLoader, L$script tf2_test_model_call.py/SequentialModel/__call__, trampoline2()LRoot; >
CFG:
BB0[0..0]
    -> BB1
    -> BB5
BB1[1..1]
    -> BB2
    -> BB5
BB2[2..2]
    -> BB3
    -> BB5
BB3[3..3]
    -> BB4
    -> BB5
BB4[4..4]
    -> BB5
BB5[-1..-2]
Instructions:
BB0
0   v3 = getfield < PythonLoader, LRoot, $function, <PythonLoader,LRoot> > v1
BB1
1   v4 = checkcast <PythonLoader,Lscript tf2_test_model_call.py/SequentialModel/__call__>v3
BB2
2   v5 = getfield < PythonLoader, LRoot, $self, <PythonLoader,LRoot> > v1
BB3
3   v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v4,v5,v2 @2 exception:v7
BB4
4   return v6                                
BB5

However, v1 still points to the wrong thing:

[Node: synthetic < PythonLoader, L$script tf2_test_model_call.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@113 ], v1] --> [SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:Lobject in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]}]

The "receiver" is stil Lobject (this is in "test 1" that uses the callable). However, it is the following in "test 4" (explicit call to __call__():

[Node: synthetic < PythonLoader, L$script tf2_test_model_call.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@114 ], v1] --> [SMIK:SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:L$script tf2_test_model_call.py/SequentialModel/__call__ in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]}@creator:Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]]

The "receiver" here is: L$script tf2_test_model_call.py/SequentialModel/__call__. Thus, even though we have the correct IMethod being selected, somehow the pointer analysis is still wrong.

khatchad commented 9 months ago

In the working case (test 4), by the time we hit com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.addConstraintsFromNewNodes(IProgressMonitor) to process the newly found node, curiously the pointer analysis already has:

[Node: synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@114 ], v1] ->
     SMIK:SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call4.py/SequentialModel, do()LRoot; >:L$script tf2_test_model_call4.py/SequentialModel/__call__ in CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@111 ]}@creator:Node: synthetic < PythonLoader, Lscript tf2_test_model_call4.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@111 ]
khatchad commented 9 months ago

Thus, at the point of adding the "new" node, we've already know that v1 refers to SequentialModel.__call__(), and v3 is then assigned to from v3. But only constraints from v3 are generated, which is too late. How does it know about v1 before processing the "new" node?

khatchad commented 9 months ago

The problem may have something to do with v1 being implicit in the IR above, i.e., there exists no explicit assignment of v1.

khatchad commented 9 months ago

Ah, because this isn't a "static" method (not sure what that means for Python), v1 must point to the implicit parameter (i.e., the receiver object).

khatchad commented 9 months ago

When we get to com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.getTargetForCall(CGNode, CallSiteReference, IClass, InstanceKey[]), the pointer analysis does not contain this key, so there must be something in between that adds it.