ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.
https://ziglang.org
MIT License
34.24k stars 2.5k forks source link

Optimize tautological assignments to no-op #16448

Open matklad opened 1 year ago

matklad commented 1 year ago

Consider the following code:

const std = @import("std");

const S = struct {
    x: [1024]u8,
    y: i32,
};

export fn init_s(s: *S) void {
    @memset(&s.x, 92);
    s.* = .{
        .x = s.x,
        .y = 92,
    };
}

Currently (0.11.0-dev.4004+a57608217) it produces the following LLVM IR:

; Function Attrs: nounwind
define dso_local void @init_s(ptr nonnull align 4 %0) #0 !dbg !147 {
Entry:
  %1 = alloca ptr, align 8
  %2 = alloca ptr, align 8
  store ptr %0, ptr %2, align 8
  call void @llvm.dbg.declare(metadata ptr %2, metadata !162, metadata !DIExpression()), !dbg !163
  store ptr %0, ptr %1, align 8, !dbg !164
  %3 = load ptr, ptr %1, align 8, !dbg !166
  %4 = getelementptr inbounds %main.S, ptr %3, i32 0, i32 1, !dbg !166
  call void @llvm.memset.p0.i64(ptr align 1 %4, i8 92, i64 1024, i1 false), !dbg !166
  %5 = getelementptr inbounds %main.S, ptr %0, i32 0, i32 1, !dbg !167
  %6 = getelementptr inbounds %main.S, ptr %0, i32 0, i32 1, !dbg !168
  ;     Note this |
  ;               V
  call void @llvm.memcpy.p0.p0.i64(ptr align 1 %5, ptr align 1 %6, i64 1024, i1 false), !dbg !168

  %7 = getelementptr inbounds %main.S, ptr %0, i32 0, i32 0, !dbg !168
  store i32 92, ptr %7, align 4, !dbg !168
  ret void, !dbg !168
}

This memcpy is a no-op, as it assigns x to itself and could be optimized out. I want to argue that it should be optimized out, specifically:

Context:

Why special-case this weird code? Turns out, this is a useful pattern we use quite a bit in TigerBeetle, with the most representative example being Replica.init. The overall situation there is that the S we want to initialize has quite a few fields, with some dedendencies between them. So we can't init S in one-go, using .{} syntax, we need to do a piece-wise initialization. However, we also want the compiler to hold our hand here, and to make sure that we indeed did initialize all fields. So what we do in the end is

self.* = {
   // Few manually initialized fields.
   .state_machine = self.state_machine
   ...

   // Many simple fields which are directly initialized here. 
   .ping_timeout = Timeout{ ... }
   ...
}

We can imagine having some dedicated language feature for saying "I have already initialized this field", like

    s.* = .{
        .x = _,
        .y = 92,
    };

but the current pattern seems basically fine, modulo this missing optimization.

Note: this is another offshoot of https://ziggit.dev/t/how-to-avoid-implicit-memcpys/1197.

matklad commented 1 year ago

Correction:

The above LLVM is for debug build mode. If I actually enable release safe, I see that LLVM is capable optimizing this particular memcpy. Howevere:

kprotty commented 1 year ago

The compiler term for this seems to be Dead Store Elimination.

andrewrk commented 1 year ago

Seems like a reasonable optimization to introduce, given that this pattern is useful.