ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.85k stars 2.55k forks source link

Inline slices #21399

Closed Jarred-Sumner closed 1 month ago

Jarred-Sumner commented 1 month ago

In Zig, the following code is a use-after-stack:

const Wrapper = struct {
  single_value: u32,
  pub fn slice(self: @This()) []const u32 {
    return &.{self.single_value};
  }
};

pub fn doThing(wrapper: Wrapper) []const u32 {
  return wrapper.slice();
}

This is extremely easy to do.

The easiest way to fix it is something like this:

const Wrapper = struct {
  single_value: u32,
  pub fn slice(self: @This(), allocator: std.mem.Allocator) ![]u32 {
    const result = try allocator.alloc(u32, 1);
    result[0] = self.single_value;
    return result;
  }
};

pub fn doThing(wrapper: Wrapper, allocator: std.mem.Allocator) ![]const u32 {
  // allocate a new slice
  return try wrapper.slice(allocator);
}

But, now we're doing a heap allocation for a single 4 byte value that occupies less space than a slice (16 bytes on x64). This is not a good use of memory.

The 2nd-easiest way to fix it is something like this:

const Wrapper = struct {
  single_value: u32,
  pub fn slice(self: @This(), buf: []u32) usize {
    buf[0] = self.single_value;
    return 1;
  }
};

pub fn doThing(wrapper: Wrapper, buf: []u32) usize {
  // return how many were written to the slice
  return wrapper.slice(buf);
}

We can make it the caller's job to own the memory. And often times, we should do it that way.

However, sometimes we can't or shouldn't. Sometimes, passing fixed-length buffers ahead of time makes the API significantly more difficult to reason about.

Instead, what if we could do it that first way, but without a heap allocation?

const Wrapper = struct {
  single_value: u32,
  pub fn slice(self: @This()) []inline u32 {
    // return a slice with the contents passed by value to the caller 
    // if it fits in the amount of stack space allocated for the slice
    return &.{self.single_value};
  }
};

The compiler knows this is a slice of a single value. The slice itself occupies 16 bytes. Only 4 of those bytes are truly in use.

Slices do not have a defined memory layout (they're not extern or packed).

Therefore, the unused bytes of slices could be used for other things.

You could store at least 3 u32s in this inline slice, and have space for a length field to answer how many values, or if the []inline T is represented as a []T (a non-inline slice)

Then, there's no risk of use-after-stack (since it would be copied by value), and the code is as simple to reason about as a single value.

Why not just an array?

Arrays don't store a length. You can make up a convention for each usage, but slices in Zig are nice.

What about using slices and inline slices in the same code?

The []inline T syntax could also be used to point to a regular []T slice. The compiler can decide how to represent the slice in a way to disambiguate which kind it is.

InKryption commented 1 month ago

This seems like a complication of the language that offers little in comparison to simply using std.BoundedArray. It also doesn't actually address the safety concern of UAF, since it is an opt-in feature that would require the programmer to know they're returning a stack pointer in the first place, which they wouldn't if it's by mistake.

mlugg commented 1 month ago

This is a wholly unreasonable language complication over just using std.BoundedArray. As InKryption points out, it fails to actually address the UAF concern. In fact, it makes the problem much worse: syntax which is today clearly incorrect (return &.{runtime_val};) becomes ambiguous under this proposal, depending on type information, and hence masking UAF bugs. This proposal as written also has some obvious holes:

This proposal seems pretty clearly DOA.

silversquirl commented 1 month ago

There's an easier way to do this that requires no changes to the language:

pub fn slice(self: *const @This()) *const [1]u32 {
    return &self.single_value;
}

Single-pointers can be coerced to single-pointers to 1-element arrays, which of course can then be coerced to slices wherever you need them. If you want, you can also obviously use @as to keep the []const u8 return type.

castholm commented 1 month ago

You also have the option of making the function that returns the slice an inline fn:

const std = @import("std");

pub const Wrapper = struct {
    value1: u32,
    value2: u32,
    pub inline fn slice(self: @This()) []const u32 {
        return &.{ self.value1, self.value2 };
    }
};

pub fn main() void {
    const w: Wrapper = .{ .value1 = 1, .value2 = 2 };
    const slice = w.slice();
    std.debug.print("{any}\n", .{slice});
}
{ 1, 2 }

Edit: This appears to work currently, but it might not be intended to be legal, see @InKryption's comment below.

N00byEdge commented 1 month ago

I don't think this is needed nor desired, we can do the following:

pub const Wrapper = extern union {
    as_values: extern struct {
      value1: u32,
      value2: u32,
    },
    as_array: [2]u32,
    pub inline fn slice(self: *const @This()) []const u32 {
        return &self.as_array;
    }
};

or even

pub const Wrapper = extern struct {
    value1: u32,
    value2: u32,
    pub inline fn slice(self: *const @This()) []const u32 {
        return std.mem.bytesAsSlice(u32, std.mem.asBytes(self));
    }
};
InKryption commented 1 month ago

@castholm

You also have the option of making the function that returns the slice an inline fn:

Zig is lexically scoped; the fact that an inline function will share a stack frame with its caller doesn't mean returning a pointer to its locals is or should be valid.