Open khatchad opened 10 months ago
Here's the IR of f()
:
callees of node f : [g]
IR of node 3, context CallStringContext: [ script A.py.do()LRoot;@96 ]
<Code body of function Lscript A.py/f>
CFG:
BB0[-1..-2]
-> BB1
BB1[0..3]
-> BB2
-> BB3
BB2[4..7]
-> BB3
BB3[-1..-2]
Instructions:
BB0
BB1
0 v2 = new <PythonLoader,Lscript A.py/f/g>@0<no information> [2=[g]]
1 global:global script A.py/f/g = v2 <no information> [2=[g]]
2 putfield v1.< PythonLoader, LRoot, g, <PythonLoader,LRoot> > = v2<no information> [1=[the function]2=[g]]
3 v5 = invokeFunction < PythonLoader, LCodeBody, do()LRoot; > v2 @3 exception:v6A.py [6:8] -> [6:11] [5=[a]2=[g]]
BB2
6 v9 = binaryop(eq) v5 , v7:#5 A.py [7:11] -> [7:17] [5=[a]7=[cmp0]]
7 assert v9 (fromSpec: true) A.py [7:4] -> [7:17]
BB3
Step-by-step:
0 v2 = new <PythonLoader,Lscript A.py/f/g>@0<no information> [2=[g]]
1 global:global script A.py/f/g = v2 <no information> [2=[g]]
2 putfield v1.< PythonLoader, LRoot, g, <PythonLoader,LRoot> > = v2<no information> [1=[the function]2=[g]]
Function g()
gets stored in v2
. But, that also happens for f()
in the script:
callees of node Lscript A.py : [f]
IR of node 2, context CallStringContext: [ com.ibm.wala.FakeRootClass.fakeRootMethod()V@2 ]
<Code body of function Lscript A.py>
CFG:
BB0[-1..-2]
-> BB1
BB1[0..95]
-> BB2
-> BB4
BB2[96..96]
-> BB3
-> BB4
BB3[97..97]
-> BB4
BB4[-1..-2]
Instructions:
...
90 v241 = new <PythonLoader,Lscript A.py/f>@90<no information> [241=[f]]
91 global:global script A.py/f = v241 <no information> [241=[f]]
92 putfield v1.< PythonLoader, LRoot, f, <PythonLoader,LRoot> > = v241<no information> [241=[f]]
So, that's not unheard of. Thus, it seems that nothing special is really going on for embedded functions; it even happens at the (global) script level.
It would seem then that either scripts or functions that define functions have a field whose value is the defined function. But, I'm unsure why. It's not used at all in this IR; it's just stored and the function is called using the value that is being stored and not the field. Maybe it's used if an embedded function is called from a function other than the outer function (is that possible?). Can you call f.g.()
from the script level of A.py
?
Looks like that's not possible. Not sure of the reason for this then.
The question now is whether this should be considered a "mod" in the ModRef analysis ...
In added test, the points-to set is non-empty, while in mead-baseline, it is empty.
In the test:
pointerKey StaticFieldKey (id=167)
[<field global script A.py/f/g>]
pointsToSet OrdinalSet<T> (id=224)
[SITE_IN_NODE{<Code body of function Lscript A.py/f>:Lscript A.py/f/g in CallStringContext: [ script A.py.do()LRoot;@96 ]}]
This is why we're not filtering out this location.
I wonder why we have an empty points-to set in mead-baseline, or why even having an empty points-to set is important here....
In test's pointer analysis:
[<field global script A.py/f/g>] --> [SITE_IN_NODE{<Code body of function Lscript A.py/f>:Lscript A.py/f/g in CallStringContext: [ script A.py.do()LRoot;@96 ]}]
In mead-baseline:
[<field global script pretrain_paired_tf.py/main/_distributed_train_step>] --> []
There are other functions that are also nested but have non-empty points-to sets, e.g.:
[<field global script pretrain_paired_tf.py/main/_replicated_train_step>] --> [SMIK:SITE_IN_NODE{<Code body of function Lscript pretrain_paired_tf.py/main>:Lscript pretrain_paired_tf.py/main/_replicated_train_step in CallStringContext: [ script pretrain_paired_tf.py.do()LRoot;@303 ]}@creator:Node: <Code body of function Lscript pretrain_paired_tf.py/main> Context: CallStringContext: [ script pretrain_paired_tf.py.do()LRoot;@303 ]]
I am now thinking that this problem is related to https://github.com/wala/ML/issues/91 because the missing functions are decorated. Moreover, they're decorated with a weird decorator that can't be found.
Indeed, this is the case. If you comment out the decorator, the problem doesn't happen.
Blocked by https://github.com/wala/ML/issues/91.
Consider the test added in 438007ef7c8a2fb38d88e6345e9155621439aabc. Currently, the ModRef analysis lists the inner function as a heap write of the outer function. Why?
Next Steps
Dump the call graph for the code in this test.