Adding support for Template Literals

amarekano commented 3 years ago

Hi everyone, I'm going to take a stab at adding support for template literals to the fuzzer. I wanted to gather thoughts/feedback on how operations and lifting should be designed.

Typically template literals have the following structure:

let temp = `js-snippets ${expression/identifier} js-snippets ${expression/identifier} ...`

A series of js snippets interspersed with expressions/identifiers that are wrapped in ${}. If we think of the expressions/identifiers as parameters to a function (where the function here would be the template literal), then template literals can be represented quite easily by duplicating the BeginAnyFunctioDefinition and EndAnyFunctionDefinition. An example of what that might look like is listed below:

class BeginTemplateLiteral: Operation {
    // Signature stores the expressions/identifiers and their type information
    let signature: FunctionSignature

    // Currently no inputs, but can be extended to support tagged templates
    // Output of this operation is a Template literal (TBC if this should be a type in itself)
    // All identifiers/expressions are recorded as part of InnerOutputs and this will allows us to mutate them based on types recorded. 
    init(signature: FunctionSignature) {
        self.signature = signature
        super.init(numInputs: 0,
                   numOutputs: 1,
                   numInnerOutputs: signature.inputTypes.count,
                   attributes: [.isBlockBegin])
    }
}

class EndTemplateLiteral: Operation {
    init() {
        super.init(numInputs: 0, numOutputs: 0, attributes: [.isBlockEnd])
    }
}

If we agree to using these operation definitions then code generation becomes fairly straight forward: ProgramBuilder.swift:

public func generateTemplateLiteral(withSignature signature: FunctionSignature, _ body: ([Variable]) -> ()) -> Variable {
        let instr = perform(BeginTemplateLiteral(signature: signature))
        body(Array(instr.innerOutputs))
        perform(EndTemplateLiteral())
        return instr.output
}

CodeGenerators.swift:

CodeGenerator("TemplateLiteralGenerator") { b in
        b.defineTemplateLiteral(withSignature: FunctionSignature(withParameterCount: Int.random(in: 2...5))) { _ in
            b.generateRecursive()
        }
    }

Lifting Templates would be something as follows:

case let op as BeginTemplateLiteral:
    assert(instr.op === op)

    // generate expressions for the template identifiers
    expressions.append(new Literal(instr.innerOutputs.map({"${\($0.identifier)}"})))

    // This is essentially the same as as BeginCodeString
    let count = Int(pow(2, Double(codeStringNestingLevel)))-1
    let escapeSequence = String(repeating: "\\", count: count)
    w.emit("\(decl(instr.output)) = \(escapeSequence)`")
    w.increaseIndentionLevel()
    codeStringNestingLevel += 1
}

case is EndTemplateLiteral:
    codeStringNestingLevel -= 1
    w.decreaseIndentionLevel()
    let count = Int(pow(2, Double(codeStringNestingLevel)))-1
    let escapeSequence = String(repeating: "\\", count: count)
    w.emit("\(escapeSequence)`;")

Design Considerations:

Should template literals be a primitive type or can we just treat them as a string type?
Using the JS Expression Literal should work here but I'm not sure if we should create a new expression just for templates and if there are any benefits to doing that. One consideration may be that inlining expressions may trigger different engine behaviour to simply adding variable identifiers.
The indentation and backtick escaping code has been duplicated from BeingCodeString and EndCodeString and it might be worth considering extending these existing operations rather than creating new ones.
Since these operations (i.e. BeginTemplateLiteral and EndTemplateLiteral) are similar in design to BeginAnyFunctionDefinition, I think we would not need to any additional analysers, mutators, reducers, etc and reuse the existing code infra.
I haven't considered adding template tags but I think this can be accommodated as another input parameter to the operation and some tweaks to the lifter.

Let me know if there are any other considerations/design choices to consider and I'll cobble together a PR once we arrive at a consensus.

saelo commented 3 years ago

This is great, thanks for writing down your thoughts this way!

I think at this point my main question is whether we think that the TemplateLiterals need to be valid JavaScript code or not. Here is my understanding of what we currently have:

"CodeStrings" which are strings that are guaranteed to contain valid JavaScript code (in FuzzIL, they are basically the same as functions, but are then lifted to a multiline string literals). Their sole purpose is basically to enable fuzzing eval
String literals (LoadString) that will mostly just contain random string data (sometimes random characters, sometimes things like property or method names, etc.). Those are pretty much for every other use case of strings other than passing it to eval

I'm not sure that we need these new template literals (which do interpolation) to also contain valid JavaScript code, since we already have the CodeStrings for that, and it seems like ensuring that the result after interpolation is still valid JavaScript code is probably non-trivial? In that case, the implementation would likely be simpler, since it can be a single Operation CreateTemplateLiteral that e.g. contains a let parts: [String], has parts.count - 1 inputs, and then lifts to something like

`part0${input0}part1${input1}part2`

(where input0 is whatever expression the lifter has for input(0), etc.) What are your thoughts on this?

amarekano commented 3 years ago

Thanks for the feedback!

I think at this point my main question is whether we think that the TemplateLiterals need to be valid JavaScript code or not.

My main motivation to implementing TemplateLiterals in FuzzIL is to allow importing testcases from the various JS engine repos (for example about a quarter of the testcases for JSC are implemented using templates. Here's an example) and I think it would be easier to implement this natively in FuzzIL rather than having a parser lift templates to JS and then compile JS to FuzzIL. Having said that I think TemplateLiterals should be valid JS but we need not validate that in fuzzilli.

I'm not sure that we need these new template literals (which do interpolation) to also contain valid JavaScript code, since we already have the CodeStrings for that, and it seems like ensuring that the result after interpolation is still valid JavaScript code is probably non-trivial? In that case, the implementation would likely be simpler, since it can be a single Operation CreateTemplateLiteral that e.g. contains a let parts: [String], has parts.count - 1 inputs, and then lifts to...

I think your approach with using CreateTemplateLiteral as you've described previously would work for imported testcases. We ensure that the string parts of the template are immutable and that the identifiers/expressions of the template are mutable. The only change in the implementation that I would suggest is recording type information for the inputs so that we can perform relatively more intelligent mutations.

I don't believe there's much value in generating templates that aren't valid JS since at that point we may as well use LoadString to generate random strings.

WDYT?

saelo commented 3 years ago

Cool, thanks for linking those examples, that makes sense! Yeah I agree, for supporting those, it's probably easiest to go with a single CreateTemplateLiteral operation and have the FuzzIL compiler split the literals into string parts and expressions. Then, the example you linked would effectively consist of four CreateTemplateLiteral instructions and some BinaryOperations for the concatenations etc.

I'm not sure if adding type information will work for that use case though, since the mutations (in particular, the InputMutator) anyway completely ignore type information. The only case I can think of is splicing, when it rewires some of the inputs. But then you'd either need the compiler to be able to figure out type information of those inputs, which it isn't capable of, or somehow do some post processing when importing the samples into Fuzzilli to determine the current types. Not sure if it's worth the effort?

I don't believe there's much value in generating templates that aren't valid JS since at that point we may as well use LoadString to generate random strings.

I guess it depends on what you are trying to fuzz. But yeah, for the use case you listed, it should be fine to use them as code without trying to enforce that in FuzzIL somehow.

amarekano commented 3 years ago

But then you'd either need the compiler to be able to figure out type information of those inputs, which it isn't capable of, or somehow do some post processing when importing the samples into Fuzzilli to determine the current types. Not sure if it's worth the effort?

Yes, you're right. Both these options are non-trivial to implement and I feel for now it's probably not worth the effort.

I've started putting the PR together for this and I've run into an issue with inlining that I'm unable to work around. Here's a testcase that I've written:

let v0 = b.loadInt(1337)
let v1 = b.createTemplateLiteral(with: ["Hello", "World"], andIdentifiers: [v0])

let program = b.finalize()

let lifted_program = fuzzer.lifter.lift(program)
let expected_program = """
let v0 = 1337;
let v1 = `Hello${v0}World`

Due to the inlining policy, literals are inlined and that results in the lifted program being let v1 = `Hello${1337}World`. I'm not sure on how I can temporarily disable this policy based on context. What I would like to do is disable/override inlining when lifting in the context of certain instructions. Any thoughts on how I should go about implementing this?

saelo commented 3 years ago

So why do you want to disallow inlining in that case? I think it's fine for the lifter to behave the way it does for other instructions, i.e. inline literals but pretty much nothing else. Then you'll still get things like the example you linked working just fine while avoiding code such as

const v2 = 1337;
const v3 = `Hello${v2}World`;

which is a bit less intuitive to read for a person.

It's probably going to be hard to selectively disable inlining because the lifter has to decide whether to emit a variable declaration when it processes the LoadInt, not when it uses it in the CreateTemplateLiteral.

amarekano commented 3 years ago

For some reason I thought this would error out (i.e. inlining literals in the template) but testing shows that inlining works! So we should be good here.

amarekano commented 3 years ago

implemented in #208

googleprojectzero / fuzzilli

Adding support for Template Literals #202