mit-plv / bedrock2

A work-in-progress language and compiler for verified low-level programming
http://adam.chlipala.net/papers/LightbulbPLDI21/
MIT License
297 stars 45 forks source link

design: core functionality of RecordPredicates? #370

Open andres-erbsen opened 1 year ago

andres-erbsen commented 1 year ago

The RecordPredicates system seems pretty neat and I am interested in doing something similar in not that distant future. Do you still think this approach, or some aspects of it are a good enough idea to build upon?

I also read that the definition seems to be based on sepapp which I find unconvincing, and the same file also contains LiveVerif notations. The alternative I'd like to contemplate is (to_bytes field1 ++ to_bytes field2 ++ to_bytes field3)$@recordptr. But perhaps I'd become more sold if we talked about the considerations here :).

samuelgruetter commented 1 year ago

Contrary to many other approaches that I stopped using, I still believe that RecordPredicates.v is good. I'm surprised that I actually mixed LiveVerif notations into that file, they should be in a separate file.

I don't particularly like sepapp (and even less sized_predicate and sepapps), but started to accept it after many other experiments didn't work well. A central question is: What's the relationship between high-level Coq values (such as records, Z, lists) and their representation as bytes in the memory? In particular, can one Coq value correspond to several byte lists? In general, yes, in any non-canonical representation, or encodings with underspecified parts, and I didn't want to pretend that this is not needed, so a to_bytes function doesn't work. This is a pity, because then I can't use ++ to concatenate fields. A to_bytes function is basically an encoder, so next, I tried decoders from bytes to values, but that didn't work so well either, and I think it was mainly because it was hard to decide how deeply-embedded the record structure should be, and in order to write decoders, it had to be more deeply embedded than I liked. I now believe that the relationship between Coq values and bytes should be an arbitrary Prop. But then, how to concatenate two adjacent ones? sepapp is what I came up with... It's also influenced by the desire to have some definitions that proof automation can recognize when it has to isolate a record field, or sub-record, to feed to a function call, and assemble the record back together after the function call. It's a balance between shallow and deep embeddings, and choices between displaying interp deep vs displaying shallow and relying on conversion whenever a lemma needs some deep embedding. The first example of this in the bedrock2 separation logic library is seps [P1; ... Pn], which I consider as an interp deep, vs sep P1 (sep ... Pn ...) which I consider more shallow. The same choice then appears between sepapps vs a series of sepapp, and converting between them each time I want to split off a field seemed not so appealing, and I also didn't want to unfold and refold (almost) everything for every nested record access. I can absolutely imagine that at first and also second sight, sepapp and sepapps triggers a "why is all this cruft needed" reaction, but I've come to terms with it by now... :wink:

andres-erbsen commented 1 year ago

Seems like we are almost on the same page here; let's talk through the last few details.

can one Coq value correspond to several byte lists? In general, yes, in any non-canonical representation, or encodings with underspecified parts, and I didn't want to pretend that this is not needed, so a to_bytes function doesn't work

The same question shows up for field elements in fiat cryptography, and there we settled on the following solution. We first define a "overprecise" Gallina type to represent all low-level configurations of the object and right after that we define a relation between that type and Z (actually Zmod). Code that manipulates the low-level representation directly can use the first Gallina type (which has an encoding function), but callers of the library use a separation-logic predicate that takes a Z and uses an existential quantifier (treating it opaquely like an arbitrary Prop). The first predicate is also used when calling memcpy-like functions that manipulate bytes directly. Pointers to field elements use the imprecise predicate; we currently do not have structs of field elements, but we want to, and I think they should use the first predicate and have their own representation relation that refers to the field-element representation relation.

I now believe that the relationship between Coq values and bytes should be an arbitrary Prop

Ha, I actually agree with this sentence! But what sepapps does is slightly different: the representation-relation Prop is defined using separation-logic combinators rather than byte-list combinators. Do you think this is important?

I think the main (only?) difference in my suggestion is that we'd define the representation relation first, and then use separation logic to say "these bytes are in memory". In case of a record with pointers in it, the representation relation would treat the pointers as integers, whereas the "handle" to this object would likely also assert the separation-logic predicates of the objects it owns.

But then, how to concatenate two adjacent ones?

I am proposing to concatenate the representations, not the values themselves. It would be that concatenation that the proof automation would recognize, perhaps indeed keeping the parts being concatenated themselves in a list (so concat instead of ++). Being able to skipn fields of a record seems kinda neat after reading your rationale here.

samuelgruetter commented 1 year ago

Pointers to field elements use the imprecise predicate; we currently do not have structs of field elements, but we want to, and I think they should use the first predicate and have their own representation relation that refers to the field-element representation relation.

This sounds interesting, could you please elaborate? In particular: How would you want to define records with members using an imprecise predicate? How do you bring them into a sufficiently structured form so that sepcalls automation can pass a record member to a callee that expects an imprecise predicate, and assemble back the updated record after the call? And more concretely, where is the existential going to show up, relative to the concatenation?

(I hope we're not confusing record fields and with elements of finite fields, but so far it seems fine...)

andres-erbsen commented 1 year ago

I agree these are the important questions. First, some definitions, slightly simplified, but hopefully sufficient to answer the two simple questions.

In particular: How would you want to define records with members using an imprecise predicate?

And more concretely, where is the existential going to show up, relative to the concatenation?

Definition CoordinateRepr := tuple word32 10.
Definition c_to_bytes (x : CoordinateRepr) := flat_map (le_split 4) x.
Definition to_Z (x : CoordinateRepr) : Z. Admitted.
Definition FElemAt x a : mem -> Prop := ex1 (fun x' => (to_bytes x')$@a /\ to_Z x' = x).

Record PointRepr := { X: CoordinateRepr; Y : CoordinateRepr; Z : CoordinateRepr; T : CoordinateRepr }.
Definition p_to_bytes (p : PointRper) := flat_map c_to_bytes [X; Y; Z; T].
Definition to_point (x : PointRepr)  : Edwards.point. Admitted. (* calls to_Z and then Edwards.XYZT.to_affine *)
Definition PointAt (x : Point) a : mem -> Prop := ex1 (fun x' => (p_to_bytes x')$@a /\ to_point x' = x).

to a callee that expects an imprecise predicate

I am assuming sepcalls would run eexists for PointAt and the proceed to construct the precise predicate. This would need to be sequenced after destructs needed to fill in the existential, but the proof itself is as simple as it gets. The $@ would be resolved first and then to_Z should go by assumption. Fiat Crypto also includes operations that are no-op on the imprecise representation, so callers of them wouldn't use the existential wrapper at all.

assemble back the updated record after the call

Sepcalls would proceed in two passes, roughly corresponding to memory safety (separation logic) and functional correctness (representaiton-relation update). It is by design that the second one may fail after the first one succeeds, but we usually want to assert that both succeed. After updating one coordinate of the point and running the first sepcalls pass,the separation-logic state would contain ( _ ++ _ ++ ... )$@_ and c_to_bytes, but no p_to_bytes and no * inside the record. Then the second pass would operate on the list alone, without touching separation logic, to restore p_to_bytes and (optionally but easily). I'd be happy to say more, but you probably know better than me which parts are worth going into detail about.