rems-project / isla

Symbolic execution tool for Sail ISA specifications
Other
63 stars 10 forks source link

Crashes while generating the SSA-form of certain functions in the ARMv9.4a snapshot #84

Open AverardoDiMugello opened 3 days ago

AverardoDiMugello commented 3 days ago

I'm writing a compiler that translates Jib to LLVM, targeting the ARMv9.4a snapshot as my initial use case and consuming isla-lib as a dependency therein. The CFG::ssa function panics for several Jib functions from this snapshot in two separate places.

  1. Here, in place_phi_functions, while iterating over each declared var's defsites. The bug is related to the RETURN builtin being included in the all_vars arguments of place_phi_functions, even though some functions don't ever write to it. This means that RETURN doesn't appear in the result of the CFG::nodewrite call and therefore doesn't get added to the defsites collection. We then panic when trying to unwrap the look-up for RETURN in defsites. The affected functions include zThrowSError, zReservedEncoding, and zEndOfInstruction. Each of these write to the exception-related builtins and use the Arbitrary terminator. None of them write to RETURN. Several system register access functions are affected too, specifically some, but not all, of the ICH, ICV_, ICC*, and AMEVCNTVOFF0* families of functions. My current bug-fix is to replace the offending unwrap with an if-let that does nothing if "a" doesn't have a defsite. I haven't tested the effects of this other than that the crashes have gone away. I'll post and link a pull request soon. I also notice that the exception-related builtins aren't included in all_vars like RETURN is. I'm not sure what the effects of this are.

  2. Here, while finding the Dominance frontiers. This one I don't have much information on yet. The affected functions are zAArch64_MaxTxSZ, zAArch64_BlockDescSupported, zgic_read_ram, zImpDefBool, zImpDefBits, and zImpDefInt. I may not get to look at this one in-depth for a little bit, because I don't intend to do code-generation for most of those functions in my initial use-cases for the tool I'm building.

Here is some minimal code to reproduce the issue:

use std::fs::File;
use std::io::Read;

use isla_lib::bitvector::b64::B64;
use isla_lib::ir::ssa::CFG;
use isla_lib::ir::{label_instrs, prune_labels, Def, Instr, Name, Symtab};
use isla_lib::ir_lexer::new_ir_lexer;
use isla_lib::ir_parser::IrParser;

fn get_instrs_for(
    f: &str,
    defs: &[Def<Name, B64>],
    symtab: &Symtab,
) -> Option<Vec<Instr<Name, B64>>> {
    defs.iter()
        .find_map(|def| {
            if let Def::Fn(id, _, instrs) = def {
                if symtab.to_str_demangled(*id) == f {
                    return Some(instrs);
                }
            }
            None
        })
        .and_then(|v| Some(v.to_vec()))
}

fn to_ssa(instrs: Vec<Instr<Name, B64>>) -> CFG<B64> {
    let labeled = prune_labels(label_instrs(instrs));
    let mut cfg = CFG::new(&labeled);
    cfg.ssa();
    cfg.label_every_block();
    cfg
}

fn main() {
    let contents = {
        let mut contents = String::new();
        let mut input_ir = File::open("/path/to/isla-snapshots/armv9p4.ir").unwrap();
        if let Ok(num_bytes) = input_ir.read_to_string(&mut contents) {
            println!("Read {num_bytes} bytes");
        } else {
            panic!("Failed to read...");
        }
        contents
    };

    let (symtab, parsed_ir) = {
        let mut symtab = Symtab::new();
        let parsed_ir: Vec<Def<Name, B64>> = IrParser::new()
            .parse(&mut symtab, new_ir_lexer(&contents))
            .unwrap();
        (symtab, parsed_ir)
    };

    // Normal one that just works
    println!("CFG for zAlign_bits");
    let is_exception_taken_fn = get_instrs_for("zAlign_bits", &parsed_ir, &symtab).unwrap();
    to_ssa(is_exception_taken_fn)
        .dot(&mut std::io::stdout(), &symtab)
        .unwrap();

    // Crashes at ssa.rs:949 without patches
    println!("CFG for zThrowSError");
    let throw_serror_fn = get_instrs_for("zThrowSError", &parsed_ir, &symtab).unwrap();
    to_ssa(throw_serror_fn)
        .dot(&mut std::io::stdout(), &symtab)
        .unwrap();

    // Crashes at ssa.rs:785 without patches
    println!("CFG for zAArch64_MaxTxSZ");
    let gic_read_ram_fn = get_instrs_for("zAArch64_MaxTxSZ", &parsed_ir, &symtab).unwrap();
    to_ssa(gic_read_ram_fn)
        .dot(&mut std::io::stdout(), &symtab)
        .unwrap();
}
Alasdair commented 3 days ago

None of them write to RETURN

Probably always including return is just incorrect. I checked and the OCaml Jib->SSA implementation (which is considerably more tested and used) doesn't do this.

Here, while finding the Dominance frontiers. This one I don't have much information on yet. The affected functions are zAArch64_MaxTxSZ, zAArch64_BlockDescSupported, zgic_read_ram, zImpDefBool, zImpDefBits, and zImpDefInt. I may not get to look at this one in-depth for a little bit, because I don't intend to do code-generation for most of those functions in my initial use-cases for the tool I'm building.

I don't have any immediate ideas here. I do remember spending a week ages ago hunting for a subtle bug in the OCaml SSA implementation that turned out to be a mistake in 'Modern Compiler implementation in ML' book, which had an existing errata. I'm not sure I fixed this bug here, I should check that because this sounds similar.

The way Isla uses this SSA transform is as an optimization - and it always checks using symbolic execution that the re-written result is equivalent to the original code, and if it isn't the SSA form is just thrown away. That's essentially meant that bugs in this code haven't really mattered unless they affected a function I really wanted to apply this optimization too.

AverardoDiMugello commented 3 days ago

Thanks, this is helpful context. I can take a look at the OCaml Jib->SSA implementation and compare it with Isla.