ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.51k stars 2.52k forks source link

add safety checks for pointer casting #2414

Open andrewrk opened 5 years ago

andrewrk commented 5 years ago

I'm excited about this one. This connects a lot of dots and is part of the unofficial Make The Safe Build Modes More Safe project (#2301).

Here are some of the features of Zig this depends on:

The proposal is to add a secret safety field to types which have no well-defined in-memory layout, similar to how unions have a secret safety tag field. The secret safety field has an integer which denotes the type id. A unique integer id will be generated for every type across an entire compilation.

Next, augment the rules about undefined values (see #1947) with this: in safe build modes, the bit pattern of undefined shall be 0xaa (repeating) across the store size of the type and for types which have no well-defined in-memory layout, the bit pattern 0xaa repeated across the store size shall not match a valid state.

This makes it possible to add safety checks to @ptrCast, @intToPtr, and @fieldParentPtr. It will be detectable illegal behavior (see #2402) if the actual element type does not match the target type specified in the cast, or if the memory has an undefined value.

Sometimes it is desired to @ptrCast or @intToPtr when you know the memory is undefined. For these cases we introduce @ptrCastUndef and @intToPtrUndef which simultaneously cast and assign undefined to the memory. These functions allow the programmer to change the type of memory in a legal way.

iacore commented 2 years ago

This could be done with "butterfly" data before the pointed address with allocator's help.

memory layout: v is pointer address

                  v  
other data before user data here

This is used in V8 (JS runtime) to allow fast indirection with attached metadata (infrequently accessed).

Basically, you store type info, undefined-ness to the left side of the butterfly

Problems

Interop with C may break

What if the user only define a struct partially? Have undefined-bit for each field?

Existing @ptrCast with C union will break (since type info doesn't match)

It's better to have generation + memory address. (to prevent use-after-free with memory reuse).

vadim-za commented 3 weeks ago

What if I want to abuse the pointer casting to temporarily cast to a wrong type (for efficiency sake)?

E.g. imagine a sentinel-based doubly linked list (like intrusive lists in boost). Something like

const Item = struct {
    data1: Data1,
    data2: Data2,
    link: Link,
};
const Link = struct {
    next: *Item,
    prev: *Item,
};
const List = struct {
    sentinel: Link,
};

(in reality List and Link would be generic structs of course).

Instead of setting next and prev pointers to null at the ends of the list (as Zig's std implementation does), they would point to the sentinel (the main benefit is that this avoids branching in list modification operations, compared to using nulls). However, the sentinel is only a Link, but not an Item. Strictly speaking, next and prev must have type *Link in order to be able to point to the sentinel, but that would generate extra unnecessary pointer arithmetic upon list iteration and traversal, since one would need to convert from *Link to *Item in order to return *Item to the caller. So, it is potentially more efficient to pretend that sentinel is a part of a larger imagined Item object. This also drastically simplifies the list inspection in a debugger, as one can easily see the entire contents of the list items (instead of only the link data) by following the pointers.

This proposal seems to invalidate the respective implementation. The "workaround" function @ptrCastUndef also doesn't help here at all. Would one need to give up on tricks like that?

Edit: actually maybe one doesn't need @ptrCast here, the implementation mostly woudl rely on @field and @fieldParentPtr, not sure if those two would be also subject to runtime checks. One might however need to use allowzero pointers for Items, as the formal Item pointer obtained for the sentinel might be zero, in which case some @ptrCasts would be necessary.

vadim-za commented 3 weeks ago

Also what if I want to cast without knowing in advance, whether the memory is undefined or already contains precious data? What if I want a pointer to memory containing garbage (e.g. returned by an allocator), which is not equal to undefined?