ziglang / zig-spec

MIT License
177 stars 32 forks source link

roadmap to spec #40

Open matu3ba opened 2 years ago

matu3ba commented 2 years ago

The Rust project has 3 related attempts to cover the spec

It could help to have something simple and showable for interested contributors, for which 1 sounds most approachable. I suspect that 2 will be, with the exception of custom linker semantics (if ever introduced) and under assumption that we have exactly 1 compilation step without generated code to be used, relative simple to derive once we have some sort of model for comptime+runtime semantics. At least, for now, under the assumption that we dont take into account hardware specifics not covered by the current C11 memory model for parallel execution semantics. This is due to (if you read the linked paper below) optimizations for weak memory have thread-local and global effects (a thread-local optimization can enable a global one and vice versa), so I am very unsure if and how those should be represented in the type system or how to insert safety checks.

So at least from my point of view to prevent the churn of increasing a lot of paper without a lot meaning, it would be nice to start with 1 and provide a list of open semantic questions contributors can toy with and come up with something better.

Any ideas how to proceed with this? Is this the correct place to discuss or should I open an issue on the Zig compiler repo?

The related discussion Q: "Do you have already ideas on simplifying cache synchronisation protocol semantics? The C11 model has many severe flaws, which prevent a lot optimizations. You probably know this already, but for other interested readers https://plv.mpi-sws.org/c11comp/" A: "so far I am not planning on touching that. I hope someone else will solve it and I just need to plug in the solutions... and there is some nice progress .. https://dl.acm.org/doi/pdf/10.1145/3385412.3386010" ("Promising 2.0: Global Optimizations in Relaxed Memory Concurrency" by Lee et al.) Source https://www.reddit.com/r/rust/comments/wiwjch/comment/ijfo2k6/?utm_source=share&utm_medium=web2x&context=3

Link to Coq proofs: https://github.com/snu-sf/promising2-coq

Update1: added brief reddit discussio Update2: added coq proofs

exxjob commented 11 months ago

I'd like to highlight that formal verification is non-negotiable for high integrity software. Even though scoping precisely what's to be verified is a premature question, it would be good to figure out how to go about it. I would suggest this be at least in-progress for 1.0 given the inherent benefits and necessity. It will theoretically aid ongoing design research, and nurish specification development, error proofing, future proofing, and trust. If brought up too little too late, this can devolve into debt and effort fragmentation. Direct inspiration can be taken from CompCert, CakeML, potentially Vellvm and others. A handful of verifications could be repurposed. There is also a hypothetical for Zig to integrate proof assistant capabilities eventually, either like ATS or SPARK, but for now that remains a stretch. https://github.com/ziglang/zig/issues/1776, https://github.com/ziglang/zig/issues/5025

matu3ba commented 11 months ago

Status 20240101

Simplified compilation code paths

                     future optimizations
                      ▲─►    ▲─►
                      │ │    │ │
                      │ │    │ │
     1      2      3  │ ▼ 4  │ ▼               5
.zig ─► AST ─► ZIR ─► AIR ─► MIR (per arch)    ─► Machine Code
   ▲    │      │ ▲        6                    7
   │    │      │ │        ─► LLVM IR (builder) ─► Machine Code
   │    │      │ │        8
   │    │      │ │        ─► LLVM IR bc        ─► Machine Code
   │    │      ▼ │           (future)
   ◄────▼    cached files
     9          10

- 0 main => module => src\Compilation.zig update()
- 1 tokenizer.zig (~1.3k LOC) + Parse.zig (~4.1k LOC); result Ast.zig
- 2 AstGen.zig (~13k LOC); result Zir.zig
- 3 Sema.zig (~38.1k LOC); result Air.zig
- 4 x86_64 CodeGen.zig (~16.6k LOC) result Mir.zig
- 5 x86_64 Lower.zig + Emit.zig + bits etc (~4k LOC)
- 6 left out
- 7 left out
- 8 left out
- 9 zig fmt
- 10 per decl caching system

main controls logic via build.zig (Module.zig, Build.zig etc)
More detailed sketch

┌───────────────────┬───────────────────────────────────────────┐
│                   │  future optimizations                     │
│                   │   ▲─►    ▲─►                              │
│                   │   │ │    │ │                              │
│                   │   │ │    │ │              ┌───────────────┐
│     1      2      │3  │ ▼ 4  │ ▼              │5              │
│.zig ─► AST ─► ZIR │─► AIR ─► MIR (per arch)   │─► Machine Code│
│ │ ▲    │      │ ▲ │       6                   │7  (file)      │
│ │ │    │      │ │ │       ─► LLVM IR (builder)│─► Machine Code│
│ │ │    │      │ │ │       8                   │   (file)      │
│ │ │    │      │ │ │       ─► LLVM IR bc       │─► Machine Code│
│ │ │    │      ▼ │ │          (future)         │   (file)      │
│ │ ◄────▼    cached│ files                     └───────────────┘
│ │   9        │ 10 │                             │             │
└─┼────────────┼────┴─────────────────────────────┼─────────────┘
  │   ▼────────┘                                  │
  └►module◄───────────────────────────────────────┘
    job system with - 1+2
                    - 3 (per decl)
                    - 4 (per fn)
                    - linking (per section?, parallelization+semantics not clear yet)
                    - other build tasks (translating zig code to get header info etc)
    with traversal of build graph (DAG) depending on build.zig

Incremental compilation would also include the linker to store and patch (symbol, section offsets, binary positions) for decl changes for each file including rerunning all follow-up dependencies of the build DAG.

matu3ba commented 11 months ago

Even though scoping precisely what's to be verified is a premature question

  1. We still have no semantic specification of Zig file semantics with one of the biggest reasons being unclear result location semantics.
  2. Parser would be feasible, AstGen very challenging with 13k LOC, Sema huge work (likely more than 50k once Zig is finished even on ignoring parallel evluation on semantics and incremental semantics [sounds tricky to ignore during during verification]).

Direct inspiration can be taken from CompCert, CakeML, potentially Vellvm and others.

  1. Can you provide some better guidance on which methods are applicable and which not? As example, CakeML is developed within a proof framework, whereas Zig is not. As I understand it, this also applies to Pancake, which sounds to me like the closest corresponding thing we could aim for in/with Zig https://cakeml.org/pancake.

  2. The biggest obstacle to verification is that high level languages and optimizing backends like LLVM would prefer more optimizations, which require high level semantics. However, those are not (yet) tractable enough and lead to ambiguity between platform semantics and portability concerns. This applies to 1. pointer semantics and 2. synchronization semantics. Can you follow me here or do you need more specific examples from https://github.com/ziglang/zig/issues/6396?

  3. Can you be more specific what you expect to be shown? My expectation would be to have a formal model that shows Zig's semantics and show that it is sufficiently expressive (complete only works, if one could "patch the target semantics in" but formal models are rarely commutative), consistent and correct regarding hardware models. I do not expect a full verification of the ZIg codebase due to ZIg's scope, only the most relevant IRs and the overall code design.

  4. Something like https://github.com/vellvm/vellvm would be useful, but would mandate a stable enough AIR.

exxjob commented 11 months ago

I did say it's a premature matter. What I'm stressing is that long-term this should be emphasized to an extent, especially for major releases of Zig, rather than embrace fallacies like Go, where "implementation as a specification" is meant literally. Past Zig itself, bootstrap and utilities should also be considered.

Indeed not everything is verifiable nor does it have to be. However, lack of formal proof motivates using or creating alternatives, as alluded to before. Some of the examples are probably better used as exhibits in this sense. This is entirely open ended but I can imagine a combination of tools as in F* appropriate in the future, not excluding in context of crypto.

TateStaples commented 2 weeks ago

Hey student here interested in exploring formal verification. I've been thinking about how possible it would be to implement formal verification as a comptime assert syntax. This seems like it would integrate well with the minimal changes to current zig syntax and style while allowing for powerful verification.

When the assert only relies on other comptime verifables it can be done in the expected way, but when reliant on runtime variables it could be discharges to an SMT solver like languages such as Dafny, F*, C3, and others. This would probably drastically reduce compilation speed so verifying could be separated from the build process.

Interested in starting a project on this and curious for feedback on syntax and premise. Do people want something like this?

Example Dafny to Zig:

method InsertionSort(a: array<int>) 
  modifies a
  ensures sorted(a, 0, a.Length-1)
  ensures multiset(a[..]) == old(multiset(a[..]))
{
  for var i := 1; i < a.Length; i++
    invariant 0 < i <= a.Length
    invariant sorted(a, 0, i-1)
    invariant multiset(a[..]) == old(multiset(a[..]))
  {
    var key := a[i];
    var j := i - 1;

    while j >= 0 && a[j] > key
      invariant -1 <= j < i
      invariant sorted(a, 0, j)
      invariant multiset(a[..]) == old(multiset(a[..]))
    {
      a[j+1] := a[j];
      j := j - 1;
    }

    a[j+1] := key;
  }
}
pub fn insertionSort(arr: []i32) void {
    for (1..arr.len) |i| {
        comptime {
            assert(sorted(arr, 0, i-1));
            assert(multiset(arr[..]) == old(multiset(arr[..]))); // Multiset might need its own syntax
            assert(0 < i <= arr.len);
        }

        const key = arr[i];
        var j: usize = i - 1;

        // Inner loop to shift elements
        while (j < arr.len and arr[j] > key) : (j -= 1) {
            comptime {
                assert(sorted(arr, 0, j));
                assert(multiset(arr[..]) == old(multiset(arr[..])));
                assert(-1 <= j < i);
            }

            arr[j + 1] = arr[j];

            // Prevent underflow
            if (j == 0) break;
        }

        // Place key in correct position
        arr[j + 1] = key;
    }

   comptime assert(sorted(a, 0, arr.len-1));
   comptime assert(multiset(arr[..]) == old(multiset(arr[..])));
}

comptime sorted(arr: []i32, i1: usize, i2: usize) bool { ... }