Open dc-mak opened 5 months ago
This sounds excellent! Just as concrete syntax, ‘take’ is confusing — let* much better! And the further generalizations sound nice.
On Fri, Jun 7, 2024 at 08:02 Dhruv Makwana @.***> wrote:
Problem
We have a few problems with the current syntax
- Code-complexity: *p is supported in pre- and post-conditions but the code for this is complicated and special-cased to Owned<_>
- take is hard to explain and doesn't map cleanly to existing intuition for C programmers.
- We don't have a way to name resources. If we want to do both #303 https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/303__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxkKsx9WB$ and #311 https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/311__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxlA6nV3C$ we might benefit from a way of naming predicates.
- We have a repeated pattern for a lot examples such as the one below which is clunky. {p} unchanged only works on pointers, and doesn't generalise.
void foo(int p, int q) /@ requires take X = Owned(p); take Y = Owned(q); ... ensures take X2 = Owned(p); X == X2; @/
Solution
- Replace take X = ...; C[X] with let X = ..; C[X] when C is meta-notation for a context (expression with holes in it).
- Treat let *x = expr; body as monadic bind for type-checking purposes.
- Only allow on variables with type heap<_> (resource). R refers the output argument of a resource.
Benefits
- p is no-longer special-cased. let p = Owned(p); p + 4i32 == 0i32; works but so wouldlet list = IntList(list_ptr); list == [];`.
- Predicates are easier to explain: a predicates map pointers to ghost state values. let R = Pred(args); expr allows one to use both R and R in expr.
- We have a way to name resources. This can remove the need to have user-generated names like unpack_IntList0 and replace it with list->head.
- This would also allows us (and maybe this is too clever by a half), for things let *arr = each ..; extract &arr[index]; or extract (arr + index) and continue to refer to the array value as arr[0].
- We can generalise the {_} unchanged syntax to work with any resource, meaning the post-condition can just be {X} unchanged;.
Uncertainties:
- Do we need any restrictions on variables R : heap<_> yet? Such values are not first class (yet), and their use would be limited to the monadic bind. Though it's not clear to me how something like let p = Owned(p); let p2 = p; body should be handled (maybe just banned?). Perhaps the safest option would be to have resource variables be treated linearly.
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxtR9v5Oa$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC6WWDJOGYRT2JMQS7TZGGOODAVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM2DAMRZGE3DMOI__;!!IBzWLUs!V_S7Tm6hZoDzlaJpr72vvVRRBM6o-yhTdoB1RZwW_aofWCaUN7vhcA0s54qjlsmlO1oZXdjNcrmP32rZPiIXxofXwQ2V$ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@neel-krishnaswami open for comment :)
I'm worried that referring to things as *x
will add symbolic noise specs. Maybe we could use x
for the value, and &x
for the resource?
I think that adding an inv
clause to function specs might be better than {p} unchanged
. So
inv take x = P(args);
assume [...]
ensures [...]
desugars to
assume [...];
take x = P(args);
ensures [...]
take x' = P(args);
x == x'
On Fri, Jun 7, 2024 at 12:17 PM Neel Krishnaswami @.***> wrote:
1.
I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource? 2.
I think that adding an inv clause to function specs might be better than {p} unchanged . So
inv take x = P(args); assume [...] ensures [...]
desugars to
assume [...]; take x = P(args); ensures [...] take x' = P(args); x == x'
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312*issuecomment-2155147700__;Iw!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKDSsG_DF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC76APAQ57G3FXWUDOTZGHMJ3AVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGE2DONZQGA__;!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKKzBiDxV$ . You are receiving this because you commented.Message ID: @.***>
I’ve been thinking about something like #2 as well. One side benefit would be that #2 could also replace our use of accesses
(the shorthand for specifying ownership of globals), so we’d have a smaller specification language.
One worry I’d have about both &x and *x (purely the syntax) is that because they look like usual pointers, users might try doing other pointer-like things with these that won’t work.
On 7 Jun 2024, at 17:37, Benjamin Pierce @.***> wrote:
2 sounds great. For #1 I think I’d need to see examples both ways…
On Fri, Jun 7, 2024 at 12:17 PM Neel Krishnaswami @.***> wrote:
I'm worried that referring to things as *x will add symbolic noise specs. Maybe we could use x for the value, and &x for the resource?
I think that adding an inv clause to function specs might be better than {p} unchanged . So
inv take x = P(args); assume [...] ensures [...]
desugars to
assume [...]; take x = P(args); ensures [...] take x' = P(args); x == x'
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/rems-project/cerberus/issues/312*issuecomment-2155147700__;Iw!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKDSsG_DF$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/ABVQQC76APAQ57G3FXWUDOTZGHMJ3AVCNFSM6AAAAABI6RAEROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVGE2DONZQGA__;!!IBzWLUs!WiSfm3PsGmxs3pRkwG6vOrUS6L44Qsr8zDRkvXxa2mZmT3rtgSuBV3bmAgFx7yQ_OwKSqNVGMcjD6Ldlv8DLKKzBiDxV$ . You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>
@dc-mak looks awesome!
One worry I’d have about both &x and *x (purely the syntax) is that because they look like usual pointers, users might try doing other pointer-like things with these that won’t work.
IMO it's fine to restrict pointer operations, as long as we can detect them and report an appropriate error. But using &x
or *x
needs to mean the same thing as in C; otherwise, we should pick a wholly different syntax.
For example, I think it's a bit confusing to use *x
if x
is not a pointer-typed variable.
// This looks a bit weird because it appears to redefine p, when in fact it defines an
// entirely new symbol "*p" that happens to look like a pointer dereference (?).
let *p = Owned<int>(p);
*p + 4i32 == 0i32;
// Perhaps a better syntax would simply be
Owned<int>(p);
*p + 4i32 == 0i32;
// I find this next one very confusing. If "*list" is a value, then I would expect "list" to be a
// pointer-typed variable currently in scope.
let *list = IntList(list_ptr);
*list == [];
We could perhaps have a completely nameless scheme:
P
(including Owned
), P(arg1, ..., argn)
is a resource assertion,P(arg1, ..., argn).out
/P(arg1, ..., argn).val
to refer to its outputs,Owned(p)
, the expression *p
is a shorthand for Owned(p).val
,I agree with @dc-mak. It would be nice for *
to remain a unary operation, and it would be confusing for *xyz
to be an identifier rather than an application of the *
operation.
Since C doesn't have objects, I'm slightly more in favor of P(arg1, ..., argn)
returning a value rather than a struct with a .val
field. I like what @cp526 suggested for *p
as shorthand for Owned(p)
though.
I'm curious, what does the heap
type and monadic structure help with from a user perspective? What would go wrong if we simply removed take
and replaced it with let
, e.g. let v_in = Owned<int>(p)
?
If we did that, the overall change might be something like
take
with let
, e.g. let v_in = Owned(p)
*p
shorthand for let p_tmp = Owned(p); p_tmp
_
for "don't care", e.g. let _ = Owned(p)
, and/or allow ignoring resource predicate return values entirely when all you need is the constraints, e.g. requires Owned(p)
I agree with @dc-mak. It would be nice for
*
to remain a unary operation, and it would be confusing for*xyz
to be an identifier rather than an application of the*
operation.Since C doesn't have objects, I'm slightly more in favor of
P(arg1, ..., argn)
returning a value rather than a struct with a.val
field. I like what @cp526 suggested for*p
as shorthand forOwned(p)
though.
My suggestion about the *p
shorthand was just about continuing to have *p
be a way of referring to the pointee of p
, not to implicitly assert ownership with *p
:-) I think it's good for the ownership assertion to be more explicit than just a reference to *p
somewhere in the specification.
I'm curious, what does the
heap
type and monadic structure help with from a user perspective? What would go wrong if we simply removedtake
and replaced it withlet
, e.g.let v_in = Owned<int>(p)
?
RE heap
: With the current syntax, take v = Pred(args)
binds v
to the output of the resource, not the resource itself. If, as suggested above, we wanted a way of naming resources themselves, then we'd have to give types to those, and heap<output-type>
or resource<output-type>
would be possible choices.
So far everything is set up so resources themselves are never named, because in type checking the C program, resources are inferred. There's some situations where it could be useful to have named resources (for instance, as above, for the extract
proof hint), but it's not clear to me whether a CN that works with named resources would overall be easier to use.
RE monads: You can think of a resource take/let-binding as being a monadic let
, as originally suggested by Neel IIRC, with the intuition that taking ownership of a resource is an effect: e.g. the pure let-binding let v = 0; let v = 0; e
(repetition intentional) is equivalent to just let v = 0; e
, whereas take v = Owned(p); take v = Owned(p); e
is very different from take v = Owned(p); e
(the former asserts ownership of p
twice – a contradiction).
This is why, I think, it's useful if the syntax for resource assertions is not simply let v = Owned(p)
, because it's semantically different from a "pure" let. That doesn't mean "take" is set in stone.
Whichever concrete syntax we use, we shouldn't explain CN specifications in terms of monads to the user, but I think no-one is suggesting that. It might just be useful for our own understanding.
Problem
We have a few problems with the current syntax
*p
is supported in pre- and post-conditions but the code for this is complicated and special-cased toOwned<_>
take
is hard to explain and doesn't map cleanly to existing intuition for C programmers.{p} unchanged
only works on pointers, and doesn't generalise.Solution
take X = ...; C[X]
withlet *X = ..; C[*X]
whenC
is meta-notation for a context (expression with holes in it).let *x = pred(args); body
as monadic bind for type-checking purposes.*
on variables with typeheap<_>
(resource).*R
refers the output argument of a resource.Benefits
*p
is no-longer special-cased.let *p = Owned(p); *p + 4i32 == 0i32;
works but so wouldlet *list = IntList(list_ptr); *list == [];
.let *R = Pred(args); expr
allows one to use both R and *R in expr.unpack_IntList0
and replace it withlist->head
.let *arr = each ..;
extract &arr[index];
orextract (arr + index)
and continue to refer to the array value asarr[0]
.{_} unchanged
syntax to work with any resource, meaning the post-condition can just be{X} unchanged;
.Uncertainties:
R : heap<_>
yet? Such values are not first class (yet), and their use would be limited to the monadic bind. Though it's not clear to me how something likelet *p = Owned(p); let *p2 = p; body
should be handled (maybe just banned?). Perhaps the safest option would be to have resource variables be treated linearly.