Open celinval opened 3 months ago
I've been investigating this issue and here's what I've found out.
First, the ValidValueReq
we process for char
specifies a valid_range
that doesn't take into account the surrogate gap in the middle: [ValidValueReq { offset: 0, size: MachineSize { num_bits: 32 }, valid_range: 0..=1114111 }]
Note: 1114111 is 0x10FFFF (i.e., char::MAX
)
I tried to special-case the ty_req
closure in ty_validity_per_offset
to return two valid ranges when a char
is found:
diff --git a/kani-compiler/src/kani_middle/transform/check_values.rs b/kani-compiler/src/kani_middle/transform/check_values.rs
index a7d0f14d270..253963a4ef6 100644
--- a/kani-compiler/src/kani_middle/transform/check_values.rs
+++ b/kani-compiler/src/kani_middle/transform/check_values.rs
@@ -839,8 +839,21 @@ pub fn ty_validity_per_offset(
ty: Ty,
current_offset: usize,
) -> Result<Vec<ValidValueReq>, String> {
+ debug!(typ=?ty, kind=?ty.kind(), "ty_validity_per_offset");
let layout = ty.layout().unwrap().shape();
let ty_req = || {
+ if ty.kind() == TyKind::RigidTy(RigidTy::Char) {
+ let shape = ty.layout().unwrap().shape();
+ let value_size = match shape.abi {
+ ValueAbi::Scalar(Scalar::Initialized { value, .. })
+ | ValueAbi::ScalarPair(Scalar::Initialized { value, .. }, _) => {
+ Some(value.size(machine_info))
+ }
+ _ => None,
+ };
+ return vec![ValidValueReq { offset: 0, size: value_size.unwrap(), valid_range: WrappingRange { start: 0, end: 0xD7FF }},
+ ValidValueReq { offset: 0, size: value_size.unwrap(), valid_range: WrappingRange { start: 0xE000, end: 0x10FFFF }}]
+ };
if let Some(mut req) = ValidValueReq::try_from_ty(machine_info, ty)
However, that didn't have the intended effect as each range check is being encoded separately. In other words, this change will generate two separate checks as follows:
assert!((0..=0xD7FF).contains(value));
assert!((0xE000..0x10FFFF).contains(value));
which doesn't work well because it's requiring value
to be in BOTH ranges.
It's not clear to me that the check generation should be done this way for multiple ranges. For example, in this code
SourceOp::BytesValidity { ranges, target_ty, rvalue } => {
let value = body.new_assignment(rvalue, &mut source, InsertPosition::Before);
let rvalue_ptr = Rvalue::AddressOf(Mutability::Not, Place::from(value));
for range in ranges {
let result = build_limits(body, &range, rvalue_ptr.clone(), &mut source);
let msg =
format!("Undefined Behavior: Invalid value of type `{target_ty}`",);
body.add_check(
tcx,
&self.check_type,
&mut source,
InsertPosition::Before,
result,
&msg,
);
}
}
in which cases does it make sense to generate more than one check?
Another option is to extend ValidValueReq
to be able to work with multiple ranges. This seems to be nontracked pending work according to the comments in the source code. Maybe replacing/wrapping it with another structure like RangeSet
could save us work here.
Unfortunately, the option of extending ValidValueReq
is more involved than I initially expected for a couple reasons:
WrappingRange
which comes from the ABI API of StableMIR.WrappingRange
doesn't consider gaps within a range like the ones that we'd need for char
surrogate values.It's not clear to me such changes would pay off if char
is the only type with a gap.
Would it be OK to change the range loop above so that build_limits
constructs a single check that ensures the value falls within one of the ranges? I've just realized this wouldn't be OK if the ranges have different offsets, but it doesn't make sense to generate multiple checks for a particular offset.
I tried this code:
using the following command line invocation:
with Kani version: 0.52.0
I expected to see this happen: A valid value check failure
Instead, this happened: An unreachable block being reached due to UB