Add LoadNamedVariable, StoreNamedVariable and possibly CreateNamedVariable operations

saelo commented 3 years ago

There should be a more generic version of LoadFromScope and StoreToScope to be able to represent code such as the one shown in https://github.com/googleprojectzero/fuzzilli/issues/221 or in general any code where the name of a variable is important, e.g. due to the use of with or because variable names are embedded into JavaScript code strings. These new operations, LoadNamedVariable and StoreNamedVariable should be valid in any context and should simply assign a name to a variable:

StoreNamedVariable 'a' v3
v4 <- LoadNamedVariable 'a'

Becomes

let a = v3;
const v4 = a;

Or, alternatively

a = v3;
const v4 = a;

However, for that there need to be clear rules when a StoreNamedVariable lifts to let a = v3 and when it lifts to a = v3. Alternatively, there could also be an additional CreateNamedVariable operations which always lifts to let a = v3 while the StoreNamedVariable then always lifts to a = v3. In both cases, it might be necessary to track which named variables are currently visible so as to not define them twice (which would result in invalid JavaScript).

Afterwards, the existing LoadFromScope and StoreToScope operations can be removed.

amarekano commented 3 years ago

I've been thinking of an alternative approach to solving this problem. Instead of Store/LoadNamed ops why don't we introduce a CreateGlobalIdentifier op. This op would essentailly create a global variable/identifier that can be referenced in any scope. Consider the following example:

function f1(){
    a = 10
}

funtion f2(){
    let b = a + 20
    return b
}

a = 1
f1()
f2()

Now when we compile f1(), we encounter a which hasn't been declared before and doesn't exist in our variable map. We now generate a global identifier for a and add it to a global variable map (which would be a key-value map storing something like : g0:'a'). We also generate another regular variable (e.g. v0) which references g0 and add this to our regular var map. We can then represent the compiled function as follows:

BeginPlainFunc v0
    v1 <- LoadInteger 10
    v2 <- CreateGlobalIdentifier 'a'
    Reassign v2, v1
EndPlainFunc

When we compile the second function f2, we encounter a but instead of creating a global identifier we look up the global var map and find g0 which intrun references v2. So we substitute a here for v2. The rest of the compiled program would be as follows:

BeginPlainFunc v3
    v4 <- LoadInt 20
    v5 <- BinaryOp v2, v4, '+'
    Return v5
EndPlainFunc

Reassign v2, 1
CallFunc v0
CallFunc v3

I think this would make for a cleaner apprach to handling variable/function-use-before-declaration scenarios. Another benefit is that we now have these identifiers in our var map and the AI can now track type changes more reliably without any additional effort. This approach would also play nice with static code checks since FuzzIL vars are always declared before use (there will be some effort to suspend scope checks when we evaluate global identifiers).

WDYT?

amarekano commented 3 years ago

Hey @saelo, switching track from our splicing discussion, if you get some time could you review the proposal above and let me know what you think?

saelo commented 3 years ago

Cool idea! So I'm a bit worried about two things:

"there will be some effort to suspend scope checks when we evaluate global identifiers". Currently, the logic for variable visibility is straight-forward: once the scope closes, the variable is no longer visible. This makes the ScopeAnalyzer, and so to some degree the code generation, very efficient. We'd loose that and add some complexity, which I'm not super happy about :/
The fact that this approach can only add global variables, as far as I understand, and so the following program wouldn't really work, I think?
```
bar = 42;
function foo() {
bar();
function bar() {
    print("Hello World!");
}
}
foo();
print(bar);
```
Because if you turn the function bar into a global variable, it collides with the existing but separate bar variable in the outer scope.

Another thing, isn't this pretty similar to hoisting a LoadUndefined (or, maybe a new MakeNamedVariable 'a') into the outer scope (which is, I think, what's currently necessary to handle these cases)?

I think it'd also be good to clearly formulate what problem we are trying to solve here. At least I don't really have a good understanding of that currently. I think one problem has to do with hoisting of e.g. functions by the JavaScript engine, which might then require lots of other things to be hoisted that are referenced by the function's body. The other problem seems to be that named variables can be referenced in a way that a JS->FuzzIL compiler wouldn't easily be able to understand. E.g.

var a = 42;
eval('a = 54; ' + 'a' + ' = 44;')
print(a);   // Should be 44 now

For that, it'd be easy if we could just also create a named variable 'a' in FuzzIL, but pretty hard/impossible otherwise. IIUC, in your proposal, this would work because the Lifter would turn v2 into a in your example? Are there any other issues that occur during compilation that we are trying to solve here?

amarekano commented 2 years ago

Because if you turn the function bar into a global variable, it collides with the existing but separate bar variable in the outer scope.

That's right we would have a collision here with my approach. One way to deal with this would be to scan the current block for matching identifiers and if that does not return any results then we look for matches in the outer block. I'll give this some more thought and write up some more ideas on how to tackle this.

Another thing, isn't this pretty similar to hoisting a LoadUndefined (or, maybe a new MakeNamedVariable 'a') into the outer scope (which is, I think, what's currently necessary to handle these cases)?

How do you mean? I think we would have to track which scope creates the variable before we emit MakeNamedVar. For example:

// scope 0
function f1(){
    // scope -1
    function f2(){
        // scope -2 
        return a; 
    }
    f2()
}

a = 10;
f1()

So when we encounter a at scope -2, we should be emiting MakeNamedVar in scope 0 and and not in scope -1.

I think it'd also be good to clearly formulate what problem we are trying to solve here. At least I don't really have a good understanding of that currently.

Assume we have the following program that we want to compile to FuzzIL:

function f1(){
    let b = a + 1
    return a + b;
}

a = 10;
f1()

Using the LoadNamed/StoreNamed variables approach suggested orignally we end up with the following IL:

BeginFunc f1
    v0 <- LoadInt 1
    v1 <- LoadNamedVar 'a'
    b <- BinaryOp v1, v2, +
    v3 <- LoadNamedVar 'a'
    v4 <- BinaryOp v3, b, +
    Return v4
EndFunc

v2 <- LoadInt 10
StoreNamedVar 'a', v2
CallFunction f1

One of the challenges with this approach is that we won't be able to track type information across the execution cycle or reuse a in mutations/splicing etc since it's not mapped to a single variable in the var map. This prevents us from taking advantage of the existing fuzzing infrastructure (i.e. AI, mutations, etc). The LoadNamed/StoreNamed ops solve the lifting issue but there's not much more that we can do with a.

The problem that I was hoping to solve was to find a way where a is treated like a regular FuzzIL variable and that would allow us to apply the previously mentioned fuzzing infra. Happy to elaborate on this further if it's still not clear.

For that, it'd be easy if we could just also create a named variable 'a' in FuzzIL, but pretty hard/impossible otherwise. IIUC, in your proposal, this would work because the Lifter would turn v2 into a in your example?

Yes that's right so the lifter would still emit a but internally we would track it as another variable in the var map.

saelo commented 1 year ago

Now implemented with https://github.com/googleprojectzero/fuzzilli/commit/f7dfd455969c24a31af182fafde889c0ff68ebac

googleprojectzero / fuzzilli

Add LoadNamedVariable, StoreNamedVariable and possibly CreateNamedVariable operations #224