hacspec / hax

A Rust verification tool
https://hacspec.org/blog
Apache License 2.0
179 stars 20 forks source link

Add a iterator rewrite phase/util #394

Open W95Psp opened 10 months ago

W95Psp commented 10 months ago

Currently, we translate Rust iterators as such in all backends.

Rust iterators are lazy and represented by concrete data types: each combinator (e.g. map, zip, enumerate...) builds a concrete structure describing the semantics of the iterator. Given i an iterator, i.enumerate() is basically a tuple containing i and a counter.

While this is great for expressivity and performance, that's not often ideal in proof settings.

Characterizing a (non-lazy) iterator DSL for which we offer specific support in the backend would be great. This DSL would help us getting rid of the concrete data type Rust approach, in favor of overloaded combinators (eg enumerate, map) that work directly on collections, without indirections.

W95Psp commented 2 months ago

We want to interpret Rust iterators as functional maps. Rust iterators are not easy to translate to our backends: e.g. in F*, we cannot represent iterators in their full generality (cough coinduction cough), lazy iterators are not productive in proof settings.

What do we want to type in Rust

Currently, our main target is for loops. For instance:

  1. for item in source {..}, where source is already concrete (e.g. a vector, a slice, an array, a slice.chunks(N));
  2. for index in range {..}, where range is a range (e.g. 0..10);
  3. for (i, item) in it.enumerate() {..}, where it is an iterator;
  4. for item in it.step_by(n) {..}, where it is an iterator.

Behind the scenes, Rust desugars that to a regular loop and wraps the collection we're iterating over in a into_iter call. for loops have thus nothing special.

Between a "source" (something which is not an iterator but has an IntoIter instance or methods to produce iterators) and a "sink" (e.g. into_iter, collect or basically any function consuming an iterator without producing a new iterator), there can be various combinators (e.g. enumerate, step_by, zip...).

We need to recognize those sources, combinators and sinks.

What do we want in F* (or in other similar backends)

The idea would be to translate iterators to functional maps: functions from indexes (e.g. nat) to values.

Functions are a very thin layer to whatever the source is. For instance, if the source is an F* list, creating a functional map is trivial, with FStar.List.Tot.get. Similarly, a step_by combinator or a skip combinator is trivial to implement, and very nice to work with.

Loops can be translated to a fold on maps.

Example of what those map could look like in F*:

module ItTest

open FStar.Mul

let monotonic #t (f: nat -> option t) =
  forall i j. (j > i /\ None? (f i)) ==> None? (f j)

type map (t: Type) = f: (nat -> option t) {monotonic f}

let terminates (m: map 't) = exists n. None? (m n)

let termination_index (m: map 't {terminates m}): GTot (i:nat {
    None? (m i) 
  /\ (forall j. j < i ==> Some? (m i))
}) = admit ()

let enumerate (f: map 'a)
  : map (tuple: (nat & 'a) {Some tuple._2 == f tuple._1})
  = fun i -> match f i with
        | Some x -> Some (i, x)
        | None   -> None

let step_by (f: map 'a) (step: nat): map 'a
  = fun i -> f (step * i)

let take (f: map 'a) (n: nat): m: map 'a {terminates m}
  = fun i -> if i < n then f i else None

let skip (f: map 'a) (n: nat): map 'a
  = fun i -> f (i + n)

let rec find_index'
  (m: map 'a {terminates m})
  (f: 'a -> bool) 
  (i: nat {i <= termination_index m})
  : Tot (option nat)
  = match m i with
  | Some v -> if f v
             then Some i 
             else find_index' m f (i + 1)
  | None   -> None

let find_index (m: map 'a {terminates m}) (f: 'a -> bool) = 
  find_index' m f 0

let rec find_nth (m: map 'a {terminates m}) (f: 'a -> bool) (n: nat): option nat = 
  if n = 0
  then find_index m f
  else 
    match find_nth m f (n - 1) with
    | Some i -> find_index (skip m (i + 1)) f
    | None   -> None

let filter (m: map 'a {terminates m}) (f: 'a -> bool): map 'a = 
  fun i -> match find_nth m f i with
      | Some i -> m i
      | None   -> None

let to_list (m: map 'a {terminates m}) = 
  let rec aux (i: nat {i <= termination_index m}) =
    match m i with
    | Some x -> x::aux (i+1)
    | None   -> []
  in aux 0

let of_list (l: list 'a): map 'a = 
  fun i -> 
    if i < FStar.List.Tot.length l 
    then Some (FStar.List.Tot.index l i)
    else None

let rec list_slice
  (l: list 'a) 
  (start: nat)
  (len: nat {len + start <= List.Tot.length l})
  : Tot (l': list 'a {List.Tot.length l' == len}) (decreases len)
  = if len = 0 
    then []
    else List.Tot.index l start::list_slice l (start + 1) (len - 1)

let chunks_exact (l: list 'a) (n: pos)
  : map (l: list 'a {List.Tot.length l == n}) 
  = fun (i: nat) -> if i * n + n <= List.Tot.length l
               then Some (list_slice l (i * n) n) 
               else None

let rec fold' #acc (m: map 'a {terminates m}) (f: acc -> 'a -> acc) (init: acc) (i: nat {i <= termination_index m})
  : acc
  = match m i with
  | Some v -> f (fold' m f init (i + 1)) v
  | None   -> init

let fold #acc (m: map 'a {terminates m}) (f: acc -> 'a -> acc) (init: acc): acc
  = fold' m f init 0

let it1: map int = take (fun i -> Some (i + 100)) 100
let it2: map int = filter it1 (fun n -> n % 3 = 0)

let property = assert_norm (forall i. Some? (it2 i) ==> Some?.v (it2 i) % 3 == 0)

let folded = fold (take (enumerate it2) 20) 
  (fun acc ((i, item): (nat & int)) -> 
    assert (Some?.v (it2 i) == item);
    acc + 1
  ) 0

Technical desgins

Design A

At first, I wanted to eliminate iterators altogether: never extract any Iterator instances, kill all impl Iterator argument to any function, remove iterator-related generics and bounds whenver possible.

The idea was to replace any type that is iterable by one unique concerete type: functional maps.

This is not possible because sometimes, we work with a type T that is not an opaque iterator type, but rather either a concrete type or a opaque type with more constraint on it. For instance, in fn f<T: Iterator + Foo>(x: T), on can apply any method of Foo on x.

Design B

Then, I started desining a system in which every function that accepts a completly opaque iterator type should be rewritted as a function that accepts a functional map instead.

Then, when a function f with an opaque implicit argument is called, i.e. f(..., it, ...) where f expect it to be of type impl Iterator, we rewrite it into f(..., it.to_seq(), ...), and continue this transformation on the expression it.

But then, complications arise:

Design C

The current design is a restricted approach to design B.

The filter isn't about opaque iterators any longer: it is just an allowlist of functions (or method) and which arguments should be mapped to sequences.

My hope is to implement design C and at some point swap the "filter" from this allowlist to opaque iterators.

One important point is that for loops implicitely add into_iter calls, which is a "opaque iterator", which would trigger the transformation into sequences.

github-actions[bot] commented 2 weeks ago

This issue has been marked as stale due to a lack of activity for 60 days. If you believe this issue is still relevant, please provide an update or comment to keep it open. Otherwise, it will be closed in 7 days.

W95Psp commented 2 weeks ago

Still somehow relevant, I made some basic support in: