rems-project / cerberus

Cerberus C semantics
https://www.cl.cam.ac.uk/~pes20/cerberus/
Other
49 stars 25 forks source link

[CN] clarify implementation-defined behaviors #406

Open peterohanley opened 1 month ago

peterohanley commented 1 month ago

This is pretty broad, and the specific case I encountered might be worth separating out.

void f(int *r)
/*@ requires take ri = Owned<int>(r);
    ensures take ro = Owned<int>(r);
@*/
{
    *r |= 0x80000000;
}
% cn verify implementation_defined_conversion.c
[1/1]: f
implementation_defined_conversion.c:7:5: error: integer value not representable at type signed int
    *r |= 0x80000000;
    ~~~^~~~~~~~~~~~~ 
Value: (u32)bw_or_uf((u32)ri, (u32)2147483648u32 /* 0x80000000 */)
State file: ...

This error is correct, and allowed by the standard.

Compare to a function that has UB:

void f(int *r)
/*@ requires take ri = Owned<int>(r);
    ensures take ro = Owned<int>(r);
@*/
{
    *r += 0x08000000;
}
% cn verify implementation_defined_conversion.c
[1/1]: f
implementation_defined_conversion.c:7:5: error: Undefined behaviour
    *r += 0x08000000;
    ~~~^~~~~~~~~~~~~ 
an exceptional condition occurs during the evaluation of an expression (§6.5#5)
State file: ...

It would be nice (and a whole pile of work, of course) if the implementation-defined case mentioned that this error is due to implementation-defined behavior.

peterohanley commented 1 month ago

Most of the motivation here is to clarify that this stuff isn't a bug in CN and GCC's behavior is not the same as the C standard :)

cp526 commented 1 month ago

That would be good to add, and maybe not even so much work.

Another consideration, especially for low-level systems code, is whether we'll ever need CN to accept implementation-defined behaviour, with user-specified choice of behaviour for those cases.

peterohanley commented 1 month ago

I think we may indeed want some kind of implementation-defined behavior switch, although it might just be for this specific issue:

enum foo {
 bar = 0x80000000
};
% cn verify enum.c 
enum.c:3:8: error: constraint violation: integer constant not in the range of the representable values for its type
 bar = 0x80000000
       ^~~~~~~~~~ 
% gcc -c enum.c 
% gcc -c enum.c -Wall -Wextra -Wpedantic
enum.c:3:2: warning: ISO C restricts enumerator values to range of 'int' (2147483648 is too large) [-Wpedantic]
 bar = 0x80000000
 ^     ~~~~~~~~~~
1 warning generated.
peterohanley commented 1 month ago

Another one -- it's pervasive to pretend uint32_t or uint64_t is uintptr_t in embedded code (e.g. see in efi) and to deal with this it is relevant how wide pointers are currently. Checking a file under multiple widths is also meaningful but this doesn't need to be one CN invocation.

peterohanley commented 1 month ago

The enum thing is odd, the range is restricted to i32 but cn thinks enums are u32.

kmemarian commented 1 month ago

The enum thing is odd, the range is restricted to i32 but cn thinks enums are u32.

This is because the default implementation choice used by cerberus/cn (see DefaultImpl in ocaml_frontend/ocaml_implementation.ml) follows GCC/Clang for the implementation defined calculation of the "integer type compatible with each enumerated type".

"[...] the type is unsigned int if there are no negative values in the enumeration, otherwise int."

https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html

But the enumeration constants themselves still have type signed int. So you have for example:

enum E1 { // compatible with unsigned int
  A1
} x = A1;

enum E2 { // compatible with signed int
  A2 = -1
} y = A2;

void func(void)
{
  A1 + 1; // the addition is signed
  A2 + 1; // still signed
  x + 1;  // the adition is unsigned
  y + 1;  // the addition is signed
}
peterohanley commented 1 month ago

Another consideration, especially for low-level systems code, is whether we'll ever need CN to accept implementation-defined behaviour, with user-specified choice of behaviour for those cases.

The first case where this is actually required: a knob for the system pointer width. A lot of kernels need to know how wide pointers are to stuff them into a uint32_t (fixable in principle) or stuff them into a uintptr_t that they know is 32 bits wide (not fixable)

cp526 commented 1 month ago

Another consideration, especially for low-level systems code, is whether we'll ever need CN to accept implementation-defined behaviour, with user-specified choice of behaviour for those cases.

The first case where this is actually required: a knob for the system pointer width. A lot of kernels need to know how wide pointers are to stuff them into a uint32_t (fixable in principle) or stuff them into a uintptr_t that they know is 32 bits wide (not fixable)

In principle CN should already be parametric for the situation you describe. Cerberus is carefully designed so certain implementation-defined choices, such as widths of integer and pointer types, are factored out and switchable. CN internally uses this Cerberus functionality but doesn't so far expose the choice to the user. We could probably easily make some of this switchable in the CN user interface (although it's never been exercised, so there may be bugs).