rems-project / cerberus

Cerberus C semantics
https://www.cl.cam.ac.uk/~pes20/cerberus/
Other
51 stars 28 forks source link

[CN] Replace `take x = ..` with something else #312

Open dc-mak opened 4 months ago

dc-mak commented 4 months ago

Problem

We have a few problems with the current syntax

Solution

Benefits

Uncertainties:

bcpierce00 commented 4 months ago

This sounds excellent! Just as concrete syntax, ‘take’ is confusing — let* much better! And the further generalizations sound nice.

On Fri, Jun 7, 2024 at 08:02 Dhruv Makwana @.***> wrote:

Problem

We have a few problems with the current syntax

void foo(int p, int q) /@ requires take X = Owned(p); take Y = Owned(q); ... ensures take X2 = Owned(p); X == X2; @/

Solution

  • Replace take X = ...; C[X] with let X = ..; C[X] when C is meta-notation for a context (expression with holes in it).
  • Treat let *x = expr; body as monadic bind for type-checking purposes.
  • Only allow on variables with type heap<_> (resource). R refers the output argument of a resource.

Benefits

  • p is no-longer special-cased. let p = Owned(p); p + 4i32 == 0i32; works but so wouldlet list = IntList(list_ptr); list == [];`.
  • Predicates are easier to explain: a predicates map pointers to ghost state values. let R = Pred(args); expr allows one to use both R and R in expr.
  • We have a way to name resources. This can remove the need to have user-generated names like unpack_IntList0 and replace it with list->head.
    • This would also allows us (and maybe this is too clever by a half), for things let *arr = each ..; extract &arr[index]; or extract (arr + index) and continue to refer to the array value as arr[0].
  • We can generalise the {_} unchanged syntax to work with any resource, meaning the post-condition can just be {X} unchanged;.

Uncertainties:

  • Do we need any restrictions on variables R : heap<_> yet? Such values are not first class (yet), and their use would be limited to the monadic bind. Though it's not clear to me how something like let p = Owned(p); let p2 = p; body should be handled (maybe just banned?). Perhaps the safest option would be to have resource variables be treated linearly.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxtR9v5Oa$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC6WWDJOGYRT2JMQS7TZGGOODAVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAMRZGE3DMOI__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxofXwQ2V$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dc-mak commented 4 months ago

@neel-krishnaswami open for comment :)

neel-krishnaswami commented 4 months ago
  1. I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource?

  2. I think that adding an inv clause to function specs might be better than {p} unchanged . So

inv take x = P(args); 
assume [...]
ensures [...]

desugars to

assume [...];
               take x = P(args);
ensures [...]
               take x' = P(args);
               x == x' 
bcpierce00 commented 4 months ago

2 sounds great. For #1 I think I’d need to see examples both ways…

On Fri, Jun 7, 2024 at 12:17 PM Neel Krishnaswami @.***> wrote:

1.

I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource? 2.

I think that adding an inv clause to function specs might be better than {p} unchanged . So

inv take x = P(args); assume [...] ensures [...]

desugars to

assume [...]; take x = P(args); ensures [...] take x' = P(args); x == x'

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312*issuecomment-2155147700__;Iw!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKDSsG_DF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC76APAQ57G3FXWUDOTZGHMJ3AVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGE2DONZQGA__;!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKKzBiDxV$ . You are receiving this because you commented.Message ID: @.***>

cp526 commented 4 months ago

I’ve been thinking about something like #2 as well. One side benefit would be that #2 could also replace our use of accesses (the shorthand for specifying ownership of globals), so we’d have a smaller specification language.

One worry I’d have about both &x and *x (purely the syntax) is that because they look like usual pointers, users might try doing other pointer-like things with these that won’t work.

On 7 Jun 2024, at 17:37, Benjamin Pierce @.***> wrote:

2 sounds great. For #1 I think I’d need to see examples both ways…

On Fri, Jun 7, 2024 at 12:17 PM Neel Krishnaswami @.***> wrote:

I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource?

I think that adding an inv clause to function specs might be better than {p} unchanged . So

inv take x = P(args); assume [...] ensures [...]

desugars to

assume [...]; take x = P(args); ensures [...] take x' = P(args); x == x'

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312*issuecomment-2155147700__;Iw!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKDSsG_DF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC76APAQ57G3FXWUDOTZGHMJ3AVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGE2DONZQGA__;!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKKzBiDxV$ . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

thatplguy commented 4 months ago

@dc-mak looks awesome!

One worry I’d have about both &x and *x (purely the syntax) is that because they look like usual pointers, users might try doing other pointer-like things with these that won’t work.

IMO it's fine to restrict pointer operations, as long as we can detect them and report an appropriate error. But using &x or *x needs to mean the same thing as in C; otherwise, we should pick a wholly different syntax.

For example, I think it's a bit confusing to use *x if x is not a pointer-typed variable.

// This looks a bit weird because it appears to redefine p, when in fact it defines an 
// entirely new symbol "*p" that happens to look like a pointer dereference (?).
let *p = Owned<int>(p);
*p + 4i32  == 0i32;

// Perhaps a better syntax would simply be
Owned<int>(p);
*p + 4i32 == 0i32;

// I find this next one very confusing.  If "*list" is a value, then I would expect "list" to be a 
// pointer-typed variable currently in scope.
let *list = IntList(list_ptr); 
*list == [];
cp526 commented 1 month ago

We could perhaps have a completely nameless scheme:

thatplguy commented 1 month ago

I agree with @dc-mak. It would be nice for * to remain a unary operation, and it would be confusing for *xyz to be an identifier rather than an application of the * operation.

Since C doesn't have objects, I'm slightly more in favor of P(arg1, ..., argn) returning a value rather than a struct with a .val field. I like what @cp526 suggested for *p as shorthand for Owned(p) though.

I'm curious, what does the heap type and monadic structure help with from a user perspective? What would go wrong if we simply removed take and replaced it with let, e.g. let v_in = Owned<int>(p)?

If we did that, the overall change might be something like

  1. Replace take with let, e.g. let v_in = Owned(p)
  2. Make *p shorthand for let p_tmp = Owned(p); p_tmp
  3. Allow _ for "don't care", e.g. let _ = Owned(p), and/or allow ignoring resource predicate return values entirely when all you need is the constraints, e.g. requires Owned(p)
cp526 commented 1 month ago

I agree with @dc-mak. It would be nice for * to remain a unary operation, and it would be confusing for *xyz to be an identifier rather than an application of the * operation.

Since C doesn't have objects, I'm slightly more in favor of P(arg1, ..., argn) returning a value rather than a struct with a .val field. I like what @cp526 suggested for *p as shorthand for Owned(p) though.

My suggestion about the *p shorthand was just about continuing to have *p be a way of referring to the pointee of p, not to implicitly assert ownership with *p :-) I think it's good for the ownership assertion to be more explicit than just a reference to *p somewhere in the specification.

I'm curious, what does the heap type and monadic structure help with from a user perspective? What would go wrong if we simply removed take and replaced it with let, e.g. let v_in = Owned<int>(p)?

RE heap: With the current syntax, take v = Pred(args) binds v to the output of the resource, not the resource itself. If, as suggested above, we wanted a way of naming resources themselves, then we'd have to give types to those, and heap<output-type> or resource<output-type> would be possible choices.

So far everything is set up so resources themselves are never named, because in type checking the C program, resources are inferred. There's some situations where it could be useful to have named resources (for instance, as above, for the extract proof hint), but it's not clear to me whether a CN that works with named resources would overall be easier to use.

RE monads: You can think of a resource take/let-binding as being a monadic let, as originally suggested by Neel IIRC, with the intuition that taking ownership of a resource is an effect: e.g. the pure let-binding let v = 0; let v = 0; e (repetition intentional) is equivalent to just let v = 0; e, whereas take v = Owned(p); take v = Owned(p); e is very different from take v = Owned(p); e (the former asserts ownership of p twice – a contradiction).

This is why, I think, it's useful if the syntax for resource assertions is not simply let v = Owned(p), because it's semantically different from a "pure" let. That doesn't mean "take" is set in stone.

Whichever concrete syntax we use, we shouldn't explain CN specifications in terms of monads to the user, but I think no-one is suggesting that. It might just be useful for our own understanding.