ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
33.95k stars 2.48k forks source link

@ptrCast doesn't generate runtime assertion for sentinel in slice #20586

Open Guigui220D opened 2 months ago

Guigui220D commented 2 months ago

Zig Version

0.14.0-dev.224+95d9292a7

Steps to Reproduce and Observed Behavior

As the documentation points out, this (can) panics at runtime by checking that the sentinel has the right value:

const ret: [:0]const u32 = slice[0..(slice.len - 1) :0];

This doesn't though (it doesnt' generate the runtime check):

const ret: [:0]const u32 = @ptrCast(slice[0..(slice.len - 1)]);

I'm also wondering how compatible that is with the "only one way to do things" philosophy, since both "casts" work, but one has the almost invisible caveat that it doesn't generate a runtime check.

Expected Behavior

const ret: [:0]const u32 = @ptrCast(slice[0..(slice.len - 1)]);

should do the same as

const ret: [:0]const u32 = slice[0..(slice.len - 1) :0];

in my opinion (but then why would the latter exist?)

rohlem commented 2 months ago

@ptrCast directly casts the pointer without checking the validity of the pointed-to elements ("pointees"). There are types with invalid bit representations, for instance if a *u1 points to a byte holding a value 2-255, that value is invalid. (To me this is the same error category as sentinel-termination, so should imo be handled consistently.)

For sized pointers we could in normal usage scenarios add assertions verifying the validity of all pointees. However, this might be more costly than we want safety checks to be (imagine you create a slice spanning a GiB of memory). Moreover it would also be invalid for code (commonplace in OS-/kernel level), which may want to construct pointers to addresses in memory ranges which aren't currently mapped - the generated memory accesses would lead to a segfault / introduce a correctness issue. For unsized pointers ("pointers-to-many" [*]T) this is also impossible at the time of @ptrCast, because we don't know the length, so it has to be delayed until the time an element is accessed. Asserting only on access is more consistent across pointer types, which may be seen as an advantage of simplifying the language.

I'm also wondering how compatible that is with the "only one way to do things" philosophy

That sentence has since been changed to "Only one obvious way to do things." (not sure when). Since @ptrCast is a named builtin and longer to type than :0, imo it is the less-obvious way that should only be chosen if you are aware of this difference in behavior.

Guigui220D commented 2 months ago

Okay, your answer makes sense, then, thank you @rohlem

I wasnt aware of the difference in behavior (actually i wasnt aware of the :0 at first) so it kinda threw me off.

Should I close the issue then?

rohlem commented 2 months ago

I think it would make sense to explicitly document that @ptrCast does not include this assertion in the langref section for @ptrCast, and maybe also the section for sentinel-terminated slices/slicing. We could keep this open as a tracking issue until the corresponding wording has been added.