rems-project / cerberus

Cerberus C semantics

https://www.cl.cam.ac.uk/~pes20/cerberus/

Other

53 stars 28 forks source link

[CN] Replace `take x = ..` with something else #312

Open dc-mak opened 5 months ago

dc-mak commented 5 months ago

Problem

We have a few problems with the current syntax

Code-complexity: *p is supported in pre- and post-conditions but the code for this is complicated and special-cased to Owned<_>
take is hard to explain and doesn't map cleanly to existing intuition for C programmers.
We don't have a way to name resources. If we want to do both #303 and #311 we might benefit from a way of naming predicates.
We have a repeated pattern for a lot examples such as the one below which is clunky. {p} unchanged only works on pointers, and doesn't generalise.
```
void foo(int *p, int *q)
/*@ requires
take X = Owned(p); 
take Y = Owned(q);
...
ensures 
take X2 = Owned(p);
X == X2;
@*/
```

Solution

Replace take X = ...; C[X] with let *X = ..; C[*X] when C is meta-notation for a context (expression with holes in it).
Treat let *x = pred(args); body as monadic bind for type-checking purposes.
Only allow * on variables with type heap<_> (resource). *R refers the output argument of a resource.

Benefits

*p is no-longer special-cased. let *p = Owned(p); *p + 4i32 == 0i32; works but so would let *list = IntList(list_ptr); *list == [];.
Predicates are easier to explain: a predicates map pointers to ghost state values. let *R = Pred(args); expr allows one to use both R and *R in expr.
We have a way to name resources. This can remove the need to have user-generated names like unpack_IntList0 and replace it with list->head.
- This would also allows us (and maybe this is too clever by a half), for things let *arr = each ..; extract &arr[index]; or extract (arr + index) and continue to refer to the array value as arr[0].
- We can generalise the {_} unchanged syntax to work with any resource, meaning the post-condition can just be {X} unchanged;.

Uncertainties:

Do we need any restrictions on variables R : heap<_> yet? Such values are not first class (yet), and their use would be limited to the monadic bind. Though it's not clear to me how something like let *p = Owned(p); let *p2 = p; body should be handled (maybe just banned?). Perhaps the safest option would be to have resource variables be treated linearly.

bcpierce00 commented 5 months ago

This sounds excellent! Just as concrete syntax, ‘take’ is confusing — let* much better! And the further generalizations sound nice.

On Fri, Jun 7, 2024 at 08:02 Dhruv Makwana @.***> wrote:

Problem

We have a few problems with the current syntax

Code-complexity: *p is supported in pre- and post-conditions but the code for this is complicated and special-cased to Owned<_>

take is hard to explain and doesn't map cleanly to existing intuition for C programmers.

We don't have a way to name resources. If we want to do both #303 https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/303__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxkKsx9WB$ and #311 https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/311__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxlA6nV3C$ we might benefit from a way of naming predicates.

We have a repeated pattern for a lot examples such as the one below which is clunky. {p} unchanged only works on pointers, and doesn't generalise.

void foo(int p, int q) /@ requires take X = Owned(p); take Y = Owned(q); ... ensures take X2 = Owned(p); X == X2; @/

Solution

Replace take X = ...; C[X] with let X = ..; C[X] when C is meta-notation for a context (expression with holes in it).

Treat let *x = expr; body as monadic bind for type-checking purposes.

Only allow on variables with type heap<_> (resource). R refers the output argument of a resource.

Benefits

p is no-longer special-cased. let p = Owned(p); p + 4i32 == 0i32; works but so wouldlet list = IntList(list_ptr); list == [];`.

Predicates are easier to explain: a predicates map pointers to ghost state values. let R = Pred(args); expr allows one to use both R and R in expr.

We have a way to name resources. This can remove the need to have user-generated names like unpack_IntList0 and replace it with list->head.

This would also allows us (and maybe this is too clever by a half), for things let *arr = each ..; extract &arr[index]; or extract (arr + index) and continue to refer to the array value as arr[0].

We can generalise the {_} unchanged syntax to work with any resource, meaning the post-condition can just be {X} unchanged;.

Uncertainties:

Do we need any restrictions on variables R : heap<_> yet? Such values are not first class (yet), and their use would be limited to the monadic bind. Though it's not clear to me how something like let p = Owned(p); let p2 = p; body should be handled (maybe just banned?). Perhaps the safest option would be to have resource variables be treated linearly.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxtR9v5Oa$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC6WWDJOGYRT2JMQS7TZGGOODAVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAMRZGE3DMOI__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxofXwQ2V$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

dc-mak commented 5 months ago

@neel-krishnaswami open for comment :)

neel-krishnaswami commented 5 months ago

I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource?
I think that adding an inv clause to function specs might be better than {p} unchanged . So

inv take x = P(args); 
assume [...]
ensures [...]

desugars to

assume [...];
               take x = P(args);
ensures [...]
               take x' = P(args);
               x == x'

bcpierce00 commented 5 months ago

2 sounds great. For #1 I think I’d need to see examples both ways…

On Fri, Jun 7, 2024 at 12:17 PM Neel Krishnaswami @.***> wrote:

1.

I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource? 2.

I think that adding an inv clause to function specs might be better than {p} unchanged . So

inv take x = P(args); assume [...] ensures [...]

desugars to

assume [...]; take x = P(args); ensures [...] take x' = P(args); x == x'

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312*issuecomment-2155147700__;Iw!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKDSsG_DF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC76APAQ57G3FXWUDOTZGHMJ3AVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGE2DONZQGA__;!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKKzBiDxV$ . You are receiving this because you commented.Message ID: @.***>

cp526 commented 5 months ago

I’ve been thinking about something like #2 as well. One side benefit would be that #2 could also replace our use of accesses (the shorthand for specifying ownership of globals), so we’d have a smaller specification language.

One worry I’d have about both &x and *x (purely the syntax) is that because they look like usual pointers, users might try doing other pointer-like things with these that won’t work.

On 7 Jun 2024, at 17:37, Benjamin Pierce @.***> wrote:

2 sounds great. For #1 I think I’d need to see examples both ways…

On Fri, Jun 7, 2024 at 12:17 PM Neel Krishnaswami @.***> wrote:

I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource?

I think that adding an inv clause to function specs might be better than {p} unchanged . So

inv take x = P(args); assume [...] ensures [...]

desugars to

assume [...]; take x = P(args); ensures [...] take x' = P(args); x == x'

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312*issuecomment-2155147700__;Iw!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKDSsG_DF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC76APAQ57G3FXWUDOTZGHMJ3AVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGE2DONZQGA__;!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKKzBiDxV$ . You are receiving this because you commented.Message ID: @.***>

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

thatplguy commented 5 months ago

@dc-mak looks awesome!

One worry I’d have about both &x and *x (purely the syntax) is that because they look like usual pointers, users might try doing other pointer-like things with these that won’t work.

IMO it's fine to restrict pointer operations, as long as we can detect them and report an appropriate error. But using &x or *x needs to mean the same thing as in C; otherwise, we should pick a wholly different syntax.

For example, I think it's a bit confusing to use *x if x is not a pointer-typed variable.

// This looks a bit weird because it appears to redefine p, when in fact it defines an 
// entirely new symbol "*p" that happens to look like a pointer dereference (?).
let *p = Owned<int>(p);
*p + 4i32  == 0i32;

// Perhaps a better syntax would simply be
Owned<int>(p);
*p + 4i32 == 0i32;

// I find this next one very confusing.  If "*list" is a value, then I would expect "list" to be a 
// pointer-typed variable currently in scope.
let *list = IntList(list_ptr); 
*list == [];

cp526 commented 2 months ago

We could perhaps have a completely nameless scheme:

for any resource predicate P (including Owned), P(arg1, ..., argn) is a resource assertion,
subsequent assertions can refer to P(arg1, ..., argn).out/P(arg1, ..., argn).val to refer to its outputs,
following the resource assertion Owned(p), the expression *p is a shorthand for Owned(p).val,
where names for resource outputs are required for brevity, they can be introduced with standard let-bindings.

thatplguy commented 2 months ago

I agree with @dc-mak. It would be nice for * to remain a unary operation, and it would be confusing for *xyz to be an identifier rather than an application of the * operation.

Since C doesn't have objects, I'm slightly more in favor of P(arg1, ..., argn) returning a value rather than a struct with a .val field. I like what @cp526 suggested for *p as shorthand for Owned(p) though.

I'm curious, what does the heap type and monadic structure help with from a user perspective? What would go wrong if we simply removed take and replaced it with let, e.g. let v_in = Owned<int>(p)?

If we did that, the overall change might be something like

Replace take with let, e.g. let v_in = Owned(p)
Make *p shorthand for let p_tmp = Owned(p); p_tmp
Allow _ for "don't care", e.g. let _ = Owned(p), and/or allow ignoring resource predicate return values entirely when all you need is the constraints, e.g. requires Owned(p)

cp526 commented 2 months ago

I agree with @dc-mak. It would be nice for * to remain a unary operation, and it would be confusing for *xyz to be an identifier rather than an application of the * operation.

Since C doesn't have objects, I'm slightly more in favor of P(arg1, ..., argn) returning a value rather than a struct with a .val field. I like what @cp526 suggested for *p as shorthand for Owned(p) though.

My suggestion about the *p shorthand was just about continuing to have *p be a way of referring to the pointee of p, not to implicitly assert ownership with *p :-) I think it's good for the ownership assertion to be more explicit than just a reference to *p somewhere in the specification.

I'm curious, what does the heap type and monadic structure help with from a user perspective? What would go wrong if we simply removed take and replaced it with let, e.g. let v_in = Owned<int>(p)?

RE heap: With the current syntax, take v = Pred(args) binds v to the output of the resource, not the resource itself. If, as suggested above, we wanted a way of naming resources themselves, then we'd have to give types to those, and heap<output-type> or resource<output-type> would be possible choices.

So far everything is set up so resources themselves are never named, because in type checking the C program, resources are inferred. There's some situations where it could be useful to have named resources (for instance, as above, for the extract proof hint), but it's not clear to me whether a CN that works with named resources would overall be easier to use.

RE monads: You can think of a resource take/let-binding as being a monadic let, as originally suggested by Neel IIRC, with the intuition that taking ownership of a resource is an effect: e.g. the pure let-binding let v = 0; let v = 0; e (repetition intentional) is equivalent to just let v = 0; e, whereas take v = Owned(p); take v = Owned(p); e is very different from take v = Owned(p); e (the former asserts ownership of p twice – a contradiction).

This is why, I think, it's useful if the syntax for resource assertions is not simply let v = Owned(p), because it's semantically different from a "pure" let. That doesn't mean "take" is set in stone.

Whichever concrete syntax we use, we shouldn't explain CN specifications in terms of monads to the user, but I think no-one is suggesting that. It might just be useful for our own understanding.