Closed saelo closed 1 year ago
I've been thinking of an alternative approach to solving this problem. Instead of Store/LoadNamed
ops why don't we introduce a CreateGlobalIdentifier
op. This op would essentailly create a global variable/identifier that can be referenced in any scope. Consider the following example:
function f1(){
a = 10
}
funtion f2(){
let b = a + 20
return b
}
a = 1
f1()
f2()
Now when we compile f1()
, we encounter a
which hasn't been declared before and doesn't exist in our variable map. We now generate a global identifier for a
and add it to a global variable map (which would be a key-value map storing something like : g0:'a'
). We also generate another regular variable (e.g. v0
) which references g0
and add this to our regular var map. We can then represent the compiled function as follows:
BeginPlainFunc v0
v1 <- LoadInteger 10
v2 <- CreateGlobalIdentifier 'a'
Reassign v2, v1
EndPlainFunc
When we compile the second function f2
, we encounter a
but instead of creating a global identifier we look up the global var map and find g0
which intrun references v2
. So we substitute a
here for v2. The rest of the compiled program would be as follows:
BeginPlainFunc v3
v4 <- LoadInt 20
v5 <- BinaryOp v2, v4, '+'
Return v5
EndPlainFunc
Reassign v2, 1
CallFunc v0
CallFunc v3
I think this would make for a cleaner apprach to handling variable/function-use-before-declaration scenarios. Another benefit is that we now have these identifiers in our var map and the AI can now track type changes more reliably without any additional effort. This approach would also play nice with static code checks since FuzzIL vars are always declared before use (there will be some effort to suspend scope checks when we evaluate global identifiers).
WDYT?
Hey @saelo, switching track from our splicing discussion, if you get some time could you review the proposal above and let me know what you think?
Cool idea! So I'm a bit worried about two things:
bar = 42;
function foo() {
bar();
function bar() {
print("Hello World!");
}
}
foo();
print(bar);
Because if you turn the function bar
into a global variable, it collides with the existing but separate bar
variable in the outer scope.
Another thing, isn't this pretty similar to hoisting a LoadUndefined
(or, maybe a new MakeNamedVariable 'a'
) into the outer scope (which is, I think, what's currently necessary to handle these cases)?
I think it'd also be good to clearly formulate what problem we are trying to solve here. At least I don't really have a good understanding of that currently. I think one problem has to do with hoisting of e.g. functions by the JavaScript engine, which might then require lots of other things to be hoisted that are referenced by the function's body. The other problem seems to be that named variables can be referenced in a way that a JS->FuzzIL compiler wouldn't easily be able to understand. E.g.
var a = 42;
eval('a = 54; ' + 'a' + ' = 44;')
print(a); // Should be 44 now
For that, it'd be easy if we could just also create a named variable 'a' in FuzzIL, but pretty hard/impossible otherwise. IIUC, in your proposal, this would work because the Lifter would turn v2
into a
in your example?
Are there any other issues that occur during compilation that we are trying to solve here?
Because if you turn the function bar into a global variable, it collides with the existing but separate bar variable in the outer scope.
That's right we would have a collision here with my approach. One way to deal with this would be to scan the current block for matching identifiers and if that does not return any results then we look for matches in the outer block. I'll give this some more thought and write up some more ideas on how to tackle this.
Another thing, isn't this pretty similar to hoisting a LoadUndefined (or, maybe a new MakeNamedVariable 'a') into the outer scope (which is, I think, what's currently necessary to handle these cases)?
How do you mean? I think we would have to track which scope creates the variable before we emit MakeNamedVar
. For example:
// scope 0
function f1(){
// scope -1
function f2(){
// scope -2
return a;
}
f2()
}
a = 10;
f1()
So when we encounter a
at scope -2, we should be emiting MakeNamedVar
in scope 0 and and not in scope -1.
I think it'd also be good to clearly formulate what problem we are trying to solve here. At least I don't really have a good understanding of that currently.
Assume we have the following program that we want to compile to FuzzIL:
function f1(){
let b = a + 1
return a + b;
}
a = 10;
f1()
Using the LoadNamed/StoreNamed
variables approach suggested orignally we end up with the following IL:
BeginFunc f1
v0 <- LoadInt 1
v1 <- LoadNamedVar 'a'
b <- BinaryOp v1, v2, +
v3 <- LoadNamedVar 'a'
v4 <- BinaryOp v3, b, +
Return v4
EndFunc
v2 <- LoadInt 10
StoreNamedVar 'a', v2
CallFunction f1
One of the challenges with this approach is that we won't be able to track type information across the execution cycle or reuse a
in mutations/splicing etc since it's not mapped to a single variable in the var map. This prevents us from taking advantage of the existing fuzzing infrastructure (i.e. AI, mutations, etc). The LoadNamed/StoreNamed
ops solve the lifting issue but there's not much more that we can do with a
.
The problem that I was hoping to solve was to find a way where a
is treated like a regular FuzzIL variable and that would allow us to apply the previously mentioned fuzzing infra. Happy to elaborate on this further if it's still not clear.
For that, it'd be easy if we could just also create a named variable 'a' in FuzzIL, but pretty hard/impossible otherwise. IIUC, in your proposal, this would work because the Lifter would turn v2 into a in your example?
Yes that's right so the lifter would still emit a
but internally we would track it as another variable in the var map.
There should be a more generic version of
LoadFromScope
andStoreToScope
to be able to represent code such as the one shown in https://github.com/googleprojectzero/fuzzilli/issues/221 or in general any code where the name of a variable is important, e.g. due to the use ofwith
or because variable names are embedded into JavaScript code strings. These new operations,LoadNamedVariable
andStoreNamedVariable
should be valid in any context and should simply assign a name to a variable:Becomes
Or, alternatively
However, for that there need to be clear rules when a
StoreNamedVariable
lifts tolet a = v3
and when it lifts toa = v3
. Alternatively, there could also be an additionalCreateNamedVariable
operations which always lifts tolet a = v3
while theStoreNamedVariable
then always lifts toa = v3
. In both cases, it might be necessary to track which named variables are currently visible so as to not define them twice (which would result in invalid JavaScript).Afterwards, the existing
LoadFromScope
andStoreToScope
operations can be removed.