Closed wandernauta closed 2 months ago
a possible way to resolve this would be to restrict ordinary character literals to a single ordinary character, and documenting that the execution charset is assumed to be UTF-8.
I agree. If anyone wants a more insane contract they can write a proposal :)
The
CPPToCol
transform uses the Unicode code point value of a C++ ordinary character literal to come up with a corresponding integer value. This works in many cases (Latin alphabet, default compiler flags) but is not guaranteed by the spec. Specifically:I don't estimate that many actual programs will run into this, but I've put some contrived examples below; a possible way to resolve this would be to restrict ordinary character literals to a single ordinary character, and documenting that the execution charset is assumed to be UTF-8.
The following is ill-formed, but verifies as C++:
VerCors sees a single character
œ
, it has Unicode code point value 0x153, which is 339 decimal, so we pick the true branch and the postcondition holds.g++
on my machine instead sees...int
(32 bits on my machine).char
, which g++ has decided is 8 bit signed; g++ decides to wrap to -109.return 2;
and compiled as such.Another implementation might decide they prefer the value 42 instead.
As another example, this program is valid C++ and verifies, but whether the postcondition holds depends on the selected execution charset: