wala / ML

Eclipse Public License 2.0
23 stars 17 forks source link

Missing `putfield` in IR when traversing non-scalar iterable datasets #130

Open khatchad opened 6 months ago

khatchad commented 6 months ago

Consider the following Python code:

# test_tf2_dataset8.py
import tensorflow as tf

def add(a, b):
    return a + b

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).shuffle(10000).batch(32)

for images, labels in dataset:
    c = add(images, images)

And corresponding IR:

callees of node Lscript tf2_test_dataset8.py : [import, add, from_tensor_slices, shuffle, batch]

IR of node 2, context CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
<Code body of function Lscript tf2_test_dataset8.py>
CFG:
BB0[-1..-2]
    -> BB1
BB1[0..98]
    -> BB2
    -> BB9
BB2[99..115]
    -> BB3
    -> BB9
BB3[116..117]
    -> BB4
    -> BB9
BB4[118..119]
    -> BB5
    -> BB9
BB5[120..121]
    -> BB6
BB6[122..133]
    -> BB7
    -> BB9
BB7[134..142]
    -> BB8
    -> BB9
BB8[143..144]
    -> BB6
BB9[-1..-2]
Instructions:
BB0
BB1
0   global:global script tf2_test_dataset8.py = v1<no information>
1   v3 = new <PythonLoader,Lwala/builtin/enumerate>@1<no information> [3=[enumerate]]
2   v4 = new <PythonLoader,Lwala/builtin/int>@2<no information> [4=[int]]
3   v5 = new <PythonLoader,Lwala/builtin/round>@3<no information> [5=[round]]
4   v6 = new <PythonLoader,Lwala/builtin/len>@4<no information> [6=[len]]
5   v7 = new <PythonLoader,Lwala/builtin/list>@5<no information> [7=[list]]
6   v8 = new <PythonLoader,Lwala/builtin/range>@6<no information> [8=[range]]
7   v9 = new <PythonLoader,Lwala/builtin/sorted>@7<no information> [9=[sorted]]
8   v10 = new <PythonLoader,Lwala/builtin/str>@8<no information> [10=[str]]
9   v11 = new <PythonLoader,Lwala/builtin/sum>@9<no information> [11=[sum]]
10   v12 = new <PythonLoader,Lwala/builtin/type>@10<no information> [12=[type]]
11   v13 = new <PythonLoader,Lwala/builtin/zip>@11<no information> [13=[zip]]
12   v14 = new <PythonLoader,Lwala/builtin/slice>@12<no information> [14=[slice]]
13   v15 = new <PythonLoader,Lwala/builtin/__delete__>@13<no information> [15=[__delete__]]
14   v16 = new <PythonLoader,Lwala/builtin/print>@14<no information> [16=[print]]
15   v19 = invokestatic < PythonLoader, LBaseException, import()LBaseException; > @15 exception:v20<no information> [19=[BaseException]]
16   v22 = invokestatic < PythonLoader, LDeprecationWarning, import()LDeprecationWarning; > @16 exception:v23<no information> [22=[DeprecationWarning]]
17   v25 = invokestatic < PythonLoader, LException, import()LException; > @17 exception:v26<no information> [25=[Exception]]
18   v28 = invokestatic < PythonLoader, LFutureWarning, import()LFutureWarning; > @18 exception:v29<no information> [28=[FutureWarning]]
19   v31 = invokestatic < PythonLoader, LNameError, import()LNameError; > @19 exception:v32<no information> [31=[NameError]]
20   v34 = invokestatic < PythonLoader, LNone, import()LNone; > @20 exception:v35<no information> [34=[None]]
21   v37 = invokestatic < PythonLoader, LRuntimeError, import()LRuntimeError; > @21 exception:v38<no information> [37=[RuntimeError]]
22   v40 = invokestatic < PythonLoader, LStopIteration, import()LStopIteration; > @22 exception:v41<no information> [40=[StopIteration]]
23   v43 = invokestatic < PythonLoader, LTypeError, import()LTypeError; > @23 exception:v44<no information> [43=[TypeError]]
24   v46 = invokestatic < PythonLoader, LUserWarning, import()LUserWarning; > @24 exception:v47<no information> [46=[UserWarning]]
25   v49 = invokestatic < PythonLoader, LValueError, import()LValueError; > @25 exception:v50<no information> [49=[ValueError]]
26   v52 = invokestatic < PythonLoader, L__doc__, import()L__doc__; > @26 exception:v53<no information> [52=[__doc__]]
27   v55 = invokestatic < PythonLoader, L__file__, import()L__file__; > @27 exception:v56<no information> [55=[__file__]]
28   v58 = invokestatic < PythonLoader, L__name__, import()L__name__; > @28 exception:v59<no information> [58=[__name__]]
29   v61 = invokestatic < PythonLoader, Labs, import()Labs; > @29 exception:v62<no information> [61=[abs]]
30   v64 = invokestatic < PythonLoader, Lall, import()Lall; > @30 exception:v65<no information> [64=[all]]
31   v67 = invokestatic < PythonLoader, Lany, import()Lany; > @31 exception:v68<no information> [67=[any]]
32   v70 = invokestatic < PythonLoader, Lbin, import()Lbin; > @32 exception:v71<no information> [70=[bin]]
33   v73 = invokestatic < PythonLoader, Lbool, import()Lbool; > @33 exception:v74<no information> [73=[bool]]
34   v76 = invokestatic < PythonLoader, Lbytes, import()Lbytes; > @34 exception:v77<no information> [76=[bytes]]
35   v79 = invokestatic < PythonLoader, Lcallable, import()Lcallable; > @35 exception:v80<no information> [79=[callable]]
36   v82 = invokestatic < PythonLoader, Lchr, import()Lchr; > @36 exception:v83<no information> [82=[chr]]
37   v85 = invokestatic < PythonLoader, Lcomplex, import()Lcomplex; > @37 exception:v86<no information> [85=[complex]]
38   v88 = invokestatic < PythonLoader, Ldel, import()Ldel; > @38 exception:v89<no information> [88=[del]]
39   v91 = invokestatic < PythonLoader, Ldict, import()Ldict; > @39 exception:v92<no information> [91=[dict]]
40   v94 = invokestatic < PythonLoader, Ldir, import()Ldir; > @40 exception:v95<no information> [94=[dir]]
41   v97 = invokestatic < PythonLoader, Ldivmod, import()Ldivmod; > @41 exception:v98<no information> [97=[divmod]]
42   v100 = invokestatic < PythonLoader, Leval, import()Leval; > @42 exception:v101<no information> [100=[eval]]
43   v103 = invokestatic < PythonLoader, Lexec, import()Lexec; > @43 exception:v104<no information> [103=[exec]]
44   v106 = invokestatic < PythonLoader, Lexit, import()Lexit; > @44 exception:v107<no information> [106=[exit]]
45   v109 = invokestatic < PythonLoader, Lfilter, import()Lfilter; > @45 exception:v110<no information> [109=[filter]]
46   v112 = invokestatic < PythonLoader, Lfloat, import()Lfloat; > @46 exception:v113<no information> [112=[float]]
47   v115 = invokestatic < PythonLoader, Lformat, import()Lformat; > @47 exception:v116<no information> [115=[format]]
48   v118 = invokestatic < PythonLoader, Lfrozenset, import()Lfrozenset; > @48 exception:v119<no information> [118=[frozenset]]
49   v121 = invokestatic < PythonLoader, Lget_ipython, import()Lget_ipython; > @49 exception:v122<no information> [121=[get_ipython]]
50   v124 = invokestatic < PythonLoader, Lgetattr, import()Lgetattr; > @50 exception:v125<no information> [124=[getattr]]
51   v127 = invokestatic < PythonLoader, Lglobals, import()Lglobals; > @51 exception:v128<no information> [127=[globals]]
52   v130 = invokestatic < PythonLoader, Lhasattr, import()Lhasattr; > @52 exception:v131<no information> [130=[hasattr]]
53   v133 = invokestatic < PythonLoader, Lhelp, import()Lhelp; > @53 exception:v134<no information> [133=[help]]
54   v136 = invokestatic < PythonLoader, Lhex, import()Lhex; > @54 exception:v137<no information> [136=[hex]]
55   v139 = invokestatic < PythonLoader, Lid, import()Lid; > @55 exception:v140<no information> [139=[id]]
56   v142 = invokestatic < PythonLoader, Linput, import()Linput; > @56 exception:v143<no information> [142=[input]]
57   v145 = invokestatic < PythonLoader, Lisinstance, import()Lisinstance; > @57 exception:v146<no information> [145=[isinstance]]
58   v148 = invokestatic < PythonLoader, Liter, import()Liter; > @58 exception:v149<no information> [148=[iter]]
59   v151 = invokestatic < PythonLoader, Llocals, import()Llocals; > @59 exception:v152<no information> [151=[locals]]
60   v154 = invokestatic < PythonLoader, Lmap, import()Lmap; > @60 exception:v155<no information> [154=[map]]
61   v157 = invokestatic < PythonLoader, Lmax, import()Lmax; > @61 exception:v158<no information> [157=[max]]
62   v160 = invokestatic < PythonLoader, Lmin, import()Lmin; > @62 exception:v161<no information> [160=[min]]
63   v163 = invokestatic < PythonLoader, Lnext, import()Lnext; > @63 exception:v164<no information> [163=[next]]
64   v166 = invokestatic < PythonLoader, Lobject, import()Lobject; > @64 exception:v167<no information> [166=[object]]
65   v169 = invokestatic < PythonLoader, Lopen, import()Lopen; > @65 exception:v170<no information> [169=[open]]
66   v172 = invokestatic < PythonLoader, Lord, import()Lord; > @66 exception:v173<no information> [172=[ord]]
67   v175 = invokestatic < PythonLoader, Lpow, import()Lpow; > @67 exception:v176<no information> [175=[pow]]
68   v178 = invokestatic < PythonLoader, Lprint, import()Lprint; > @68 exception:v179<no information> [178=[print]]
70   v181 = invokestatic < PythonLoader, Lproperty, import()Lproperty; > @70 exception:v182<no information> [181=[property]]
71   v184 = invokestatic < PythonLoader, Lrepr, import()Lrepr; > @71 exception:v185<no information> [184=[repr]]
72   v187 = invokestatic < PythonLoader, Lreversed, import()Lreversed; > @72 exception:v188<no information> [187=[reversed]]
73   v190 = invokestatic < PythonLoader, Lset, import()Lset; > @73 exception:v191<no information> [190=[set]]
74   v193 = invokestatic < PythonLoader, Lsuper, import()Lsuper; > @74 exception:v194<no information> [193=[super]]
75   v196 = invokestatic < PythonLoader, Ltuple, import()Ltuple; > @75 exception:v197<no information> [196=[tuple]]
76   v199 = invokestatic < PythonLoader, Lvars, import()Lvars; > @76 exception:v200<no information> [199=[vars]]
77   v202 = invokestatic < PythonLoader, LNotImplementedError, import()LNotImplementedError; > @77 exception:v203<no information> [202=[NotImplementedError]]
78   v205 = invokestatic < PythonLoader, LWarning, import()LWarning; > @78 exception:v206<no information> [205=[Warning]]
79   v208 = invokestatic < PythonLoader, Lcd, import()Lcd; > @79 exception:v209<no information> [208=[cd]]
80   v211 = invokestatic < PythonLoader, Lclear, import()Lclear; > @80 exception:v212<no information> [211=[clear]]
81   v214 = invokestatic < PythonLoader, Lpylab, import()Lpylab; > @81 exception:v215<no information> [214=[pylab]]
82   v217 = invokestatic < PythonLoader, LRuntimeWarning, import()LRuntimeWarning; > @82 exception:v218<no information> [217=[RuntimeWarning]]
83   v220 = invokestatic < PythonLoader, Lhist, import()Lhist; > @83 exception:v221<no information> [220=[hist]]
84   v223 = invokestatic < PythonLoader, Lmatplotlib, import()Lmatplotlib; > @84 exception:v224<no information> [223=[matplotlib]]
85   v226 = invokestatic < PythonLoader, Lrecall, import()Lrecall; > @85 exception:v227<no information> [226=[recall]]
86   v229 = invokestatic < PythonLoader, Lhistory, import()Lhistory; > @86 exception:v230<no information> [229=[history]]
87   v232 = invokestatic < PythonLoader, Ltime, import()Ltime; > @87 exception:v233<no information> [232=[time]]
88   v235 = invokestatic < PythonLoader, LKeyError, import()LKeyError; > @88 exception:v236<no information> [235=[KeyError]]
89   v238 = invokestatic < PythonLoader, Ldisplay, import()Ldisplay; > @89 exception:v239<no information> [238=[display]]
90   v241 = invokestatic < PythonLoader, Ltensorflow, import()Ltensorflow; > @90 exception:v242tf2_test_dataset8.py [1:0] -> [1:23] [241=[tf]]
91   v246 = new <PythonLoader,Lscript tf2_test_dataset8.py/add>@91<no information> [246=[add]]
92   global:global script tf2_test_dataset8.py/add = v246<no information> [246=[add]]
93   putfield v1.< PythonLoader, LRoot, add, <PythonLoader,LRoot> > = v246<no information> [246=[add]]
94   v252 = fieldref v241.v253:#keras        tf2_test_dataset8.py [8:39] -> [8:41] [241=[tf]]
95   v251 = fieldref v252.v254:#datasets     tf2_test_dataset8.py [8:39] -> [8:41]
96   v250 = fieldref v251.v255:#mnist        tf2_test_dataset8.py [8:39] -> [8:41]
97   v249 = fieldref v250.v256:#load_data    tf2_test_dataset8.py [8:39] -> [8:41]
98   v248 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v249 @98 exception:v257tf2_test_dataset8.py [8:39] -> [8:74]
BB2
99   v260 = fieldref v248.v258:#0            tf2_test_dataset8.py [8:0] -> [8:1]
100   v262 = fieldref v260.v258:#0           tf2_test_dataset8.py [8:1] -> [8:8] [262=[x_train]]
102   v264 = fieldref v260.v259:#1           tf2_test_dataset8.py [8:10] -> [8:17] [264=[y_train]]
104   v265 = fieldref v248.v259:#1           tf2_test_dataset8.py [8:0] -> [8:1]
105   v267 = fieldref v265.v258:#0           tf2_test_dataset8.py [8:21] -> [8:27] [267=[x_test]]
107   v269 = fieldref v265.v259:#1           tf2_test_dataset8.py [8:29] -> [8:35] [269=[y_test]]
109   v277 = fieldref v241.v278:#data        tf2_test_dataset8.py [9:10] -> [9:12] [241=[tf]]
110   v276 = fieldref v277.v279:#Dataset     tf2_test_dataset8.py [9:10] -> [9:12]
111   v275 = fieldref v276.v280:#from_tensor_slicestf2_test_dataset8.py [9:10] -> [9:12]
112   v281 = new <PythonLoader,Ltuple>@112   tf2_test_dataset8.py [9:46] -> [9:64]
113   fieldref v281.v258:#0 = v262 = v262    tf2_test_dataset8.py [9:46] -> [9:64] [262=[x_train]]
114   fieldref v281.v259:#1 = v264 = v264    tf2_test_dataset8.py [9:46] -> [9:64] [264=[y_train]]
115   v274 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v275,v281 @115 exception:v282tf2_test_dataset8.py [9:10] -> [9:12]
BB3
116   v273 = fieldref v274.v283:#shuffle     tf2_test_dataset8.py [9:10] -> [9:12]
117   v272 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v273,v284:#10000 @117 exception:v285tf2_test_dataset8.py [9:10] -> [9:12]
BB4
118   v271 = fieldref v272.v286:#batch       tf2_test_dataset8.py [9:10] -> [9:12]
119   v270 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v271,v287:#32 @119 exception:v288tf2_test_dataset8.py [9:10] -> [9:89] [270=[dataset, temp 3]]
BB5
BB6
122   v293 = new <PythonLoader,Ltuple>@122   tf2_test_dataset8.py [1:0] -> [12:27]
123   v295 = global:global images            tf2_test_dataset8.py [11:4] -> [11:10]
124   fieldref v293.v258:#0 = v295 = v295    tf2_test_dataset8.py [1:0] -> [12:27]
125   v297 = global:global labels            tf2_test_dataset8.py [11:12] -> [11:18]
126   fieldref v293.v259:#1 = v297 = v297    tf2_test_dataset8.py [1:0] -> [12:27]
127   v298 = a property name of v270         <no information> [270=[dataset, temp 3]]
128   v300 = fieldref v298.v258:#0           tf2_test_dataset8.py [11:4] -> [11:10] [300=[images]]
130   v302 = fieldref v298.v259:#1           tf2_test_dataset8.py [11:12] -> [11:18] [302=[labels]]
132   v291 = binaryop(ne) v292:#null , v298  tf2_test_dataset8.py [1:0] -> [12:27]
133   conditional branch(eq, to iindex=-1) v291,v258:#0tf2_test_dataset8.py [1:0] -> [12:27]
BB7
134   v304 = new <PythonLoader,Ltuple>@134   tf2_test_dataset8.py [1:0] -> [12:27]
135   fieldref v304.v258:#0 = v300 = v300    tf2_test_dataset8.py [1:0] -> [12:27] [300=[images]]
136   fieldref v304.v259:#1 = v302 = v302    tf2_test_dataset8.py [1:0] -> [12:27] [302=[labels]]
137   v303 = fieldref v270.v304              tf2_test_dataset8.py [1:0] -> [12:27] [270=[dataset, temp 3]]
138   v305 = fieldref v303.v258:#0           tf2_test_dataset8.py [11:4] -> [11:10] [305=[images]]
140   v306 = fieldref v303.v259:#1           tf2_test_dataset8.py [11:12] -> [11:18] [306=[labels]]
142   v307 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v246,v305,v305 @142 exception:v308tf2_test_dataset8.py [12:8] -> [12:27] [307=[c]246=[add]305=[images]]
BB8
144   goto (from iindex= 144 to iindex = 122)tf2_test_dataset8.py [1:0] -> [12:27]
BB9

From the above, consider the following snippet:

134   v304 = new <PythonLoader,Ltuple>@134   tf2_test_dataset8.py [1:0] -> [12:27]
135   fieldref v304.v258:#0 = v300 = v300    tf2_test_dataset8.py [1:0] -> [12:27] [300=[images]]
136   fieldref v304.v259:#1 = v302 = v302    tf2_test_dataset8.py [1:0] -> [12:27] [302=[labels]]
137   v303 = fieldref v270.v304              tf2_test_dataset8.py [1:0] -> [12:27] [270=[dataset, temp 3]]

Instruction 134 creates a new tuple and assigns images and labels to elements 0 and 1, respectively. But, I am losing track of these values later down the IR. I believe that the reason is that instruction 137 is incorrect. Specifically, there is no field of v270 whose value is stored in v304. In other words, I believe that there is a missing putfield instruction to store v304 in v270. That being said, the CAst printing is still able to track the variable names later down the IR:

138   v305 = fieldref v303.v258:#0           tf2_test_dataset8.py [11:4] -> [11:10] [305=[images]]
140   v306 = fieldref v303.v259:#1           tf2_test_dataset8.py [11:12] -> [11:18] [306=[labels]]

But, I don't understand how, because the object stored v270 doesn't have a field whose value is v304; it's created in instruction 134 but never assigned. Either way, the dataflow analysis can't track it, so something is wrong here---either the IR or the dataflow analysis.