Closed khatchad closed 9 months ago
Suppose we have this input code stored in A.py
:
import tensorflow as tf
# Create an override model to classify pictures
class SequentialModel(tf.keras.Model):
def __init__(self, **kwargs):
super(SequentialModel, self).__init__(**kwargs)
self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
# Add a lot of small layers
num_layers = 100
self.my_layers = [tf.keras.layers.Dense(64, activation="relu")
for n in range(num_layers)]
self.dropout = tf.keras.layers.Dropout(0.2)
self.dense_2 = tf.keras.layers.Dense(10)
def __call__(self, x):
x = self.flatten(x)
for layer in self.my_layers:
x = layer(x)
x = self.dropout(x)
x = self.dense_2(x)
return x
if __name__ == '__main__':
input_data = tf.random.uniform([20, 28, 28])
print("Input:")
print(type(input_data))
print(input_data)
model = SequentialModel()
result = model(input_data)
print("Output:")
print(type(input_data))
print(result)
The problematic expression above is result = model(input_data)
, because that implicitly invokes SequentialModel.__call__
. When building a call graph, I am not even seeing a node for __call__
, which is strange. Perhaps it only gets created if there is a call to the function? I would expect to see a method reference of < PythonLoader, Lscript A.py/SequentialModel/__call__, do()LRoot; >
somewhere in the following CG nodes:
Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V > Context: Everywhere
Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere
Node: <Code body of function Lscript A.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script A.py.do()LRoot;@88 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@114 ]
Node: synthetic < PythonLoader, Lscript A.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@117 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@122 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__> Context: CallStringContext: [ script A.py.SequentialModel.do()LRoot;@12 ]
Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@5 ]]
Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@25 ]
Node: synthetic < PythonLoader, LCodeBody, __Lscript A.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@26 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript A.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@111 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]
If I change the input file to use result = model.__call__(input_data)
instead, I am seeing these nodes:
Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeRootMethod()V > Context: Everywhere
Node: synthetic < PythonLoader, Lcom/ibm/wala/FakeRootClass, fakeWorldClinit()V > Context: Everywhere
Node: <Code body of function Lscript A.py> Context: CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
Node: synthetic < PythonLoader, Ltensorflow, import()Ltensorflow; > Context: CallStringContext: [ script A.py.do()LRoot;@88 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@114 ]
Node: synthetic < PythonLoader, Lscript A.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@117 ]
Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@123 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__> Context: CallStringContext: [ script A.py.SequentialModel.do()LRoot;@12 ]
Node: synthetic < PythonLoader, Lsuperfun, do()LRoot; > Context: DelegatingContext [A=super call, B=CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@5 ]]
Node: synthetic < PythonLoader, Lwala/builtin/range, do()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@25 ]
Node: synthetic < PythonLoader, LCodeBody, __Lscript A.py/SequentialModel/__init__/comprehension1()LRoot; > Context: CallStringContext: [ script A.py.SequentialModel.__init__.do()LRoot;@26 ]
Node: <Code body of function Lscript A.py/SequentialModel/__init__/comprehension1> Context: CallStringContext: [ CodeBody.__Lscript A.py/SequentialModel/__init__/comprehension1()LRoot;@2 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@111 ]
Node: synthetic < PythonLoader, L$script A.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@120 ]
Node: synthetic < PythonLoader, Ltensorflow/functions/uniform, read_data()LRoot; > Context: CallStringContext: [ tensorflow.functions.uniform.do()LRoot;@0 ]
Node: <Code body of function Lscript A.py/SequentialModel/__call__> Context: CallStringContext: [ $script A.py.SequentialModel.__call__.trampoline2()LRoot;@2 ]
And, the diff between the two:
7c7
< Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@122 ]
---
> Node: synthetic < PythonLoader, Lwala/builtin/type, do()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@123 ]
13a14
> Node: synthetic < PythonLoader, L$script A.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script A.py.do()LRoot;@120 ]
14a16
> Node: <Code body of function Lscript A.py/SequentialModel/__call__> Context: CallStringContext: [ $script A.py.SequentialModel.__call__.trampoline2()LRoot;@2 ]
The first difference just looks like a difference is the IR with the values, probably because there's one more value corresponding to the "new" function invocation.
I would think that __call__
needs to be added along side this code. It's a built-in function?
I would think that
__call__
needs to be added along side this code. It's a built-in function?
I don't think so. Specifically, __init__
isn't listed as one.
I was able to switch the target to the correct IMethod
, however, the points-to analysis is wrong. By switching the receiver in the target selector, I was able to get a node for the __call__()
trampoline:
callees of node trampoline2 : []
IR of node 11, context CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@113 ]
synthetic < PythonLoader, L$script tf2_test_model_call.py/SequentialModel/__call__, trampoline2()LRoot; >
CFG:
BB0[0..0]
-> BB1
-> BB5
BB1[1..1]
-> BB2
-> BB5
BB2[2..2]
-> BB3
-> BB5
BB3[3..3]
-> BB4
-> BB5
BB4[4..4]
-> BB5
BB5[-1..-2]
Instructions:
BB0
0 v3 = getfield < PythonLoader, LRoot, $function, <PythonLoader,LRoot> > v1
BB1
1 v4 = checkcast <PythonLoader,Lscript tf2_test_model_call.py/SequentialModel/__call__>v3
BB2
2 v5 = getfield < PythonLoader, LRoot, $self, <PythonLoader,LRoot> > v1
BB3
3 v6 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v4,v5,v2 @2 exception:v7
BB4
4 return v6
BB5
However, v1
still points to the wrong thing:
[Node: synthetic < PythonLoader, L$script tf2_test_model_call.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@113 ], v1] --> [SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:Lobject in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]}]
The "receiver" is stil Lobject
(this is in "test 1" that uses the callable). However, it is the following in "test 4" (explicit call to __call__()
:
[Node: synthetic < PythonLoader, L$script tf2_test_model_call.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@114 ], v1] --> [SMIK:SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; >:L$script tf2_test_model_call.py/SequentialModel/__call__ in CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]}@creator:Node: synthetic < PythonLoader, Lscript tf2_test_model_call.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call.py.do()LRoot;@111 ]]
The "receiver" here is: L$script tf2_test_model_call.py/SequentialModel/__call__
. Thus, even though we have the correct IMethod
being selected, somehow the pointer analysis is still wrong.
In the working case (test 4), by the time we hit com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.addConstraintsFromNewNodes(IProgressMonitor)
to process the newly found node, curiously the pointer analysis already has:
[Node: synthetic < PythonLoader, L$script tf2_test_model_call4.py/SequentialModel/__call__, trampoline2()LRoot; > Context: CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@114 ], v1] ->
SMIK:SITE_IN_NODE{synthetic < PythonLoader, Lscript tf2_test_model_call4.py/SequentialModel, do()LRoot; >:L$script tf2_test_model_call4.py/SequentialModel/__call__ in CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@111 ]}@creator:Node: synthetic < PythonLoader, Lscript tf2_test_model_call4.py/SequentialModel, do()LRoot; > Context: CallStringContext: [ script tf2_test_model_call4.py.do()LRoot;@111 ]
Thus, at the point of adding the "new" node, we've already know that v1
refers to SequentialModel.__call__()
, and v3
is then assigned to from v3
. But only constraints from v3
are generated, which is too late. How does it know about v1
before processing the "new" node?
The problem may have something to do with v1
being implicit in the IR above, i.e., there exists no explicit assignment of v1
.
Ah, because this isn't a "static" method (not sure what that means for Python), v1
must point to the implicit parameter (i.e., the receiver object).
When we get to com.ibm.wala.ipa.callgraph.propagation.PropagationCallGraphBuilder.getTargetForCall(CGNode, CallSiteReference, IClass, InstanceKey[])
, the pointer analysis does not contain this key, so there must be something in between that adds it.
__call__
doesn't show up in the call graph.