nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.55k stars 1.47k forks source link

Duplicate/redundant `nimZeroMem()` in `--mm:refc` #23383

Open tersec opened 7 months ago

tersec commented 7 months ago

Description

import std/strformat
discard parseStandardFormatSpecifier("")

is one example, but browsing generated C with --mm:refc shows many of them.

C compilers aren't necessarily reliable, nor should they be, at detecting that this is a verifiably idempotent operation, so the later calls can be removed, so it creates real runtime (profiled, even) overhead.

Example of generated duplicate nimZeroMem for that code in @mNim@slib@spure@sstrformat.nim.c:

static N_INLINE(tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw,
                dotdot___stdZenumutils_90)(NI a, NI b) {
  tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw result;
  nimZeroMem((void *)(&result),
             sizeof(tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw));
  nimZeroMem((void *)(&result),
             sizeof(tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw));
  result.a = a;
  result.b = b;
  popFrame();
  return result;
}

In this case, an tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw isn't huge, but for larger objects and/or in loops, this becomes more significant.

Nim Version

Nim Compiler Version 1.6.18 [Linux: amd64]
Compiled at 2024-03-09
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: a749a8b742bd0a4272c26a65517275db4720e58a
active boot switches: -d:release
Nim Compiler Version 2.0.3 [Linux: amd64]
Compiled at 2024-03-09
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: e374759f29da733f3c404718c333f5f3cb5f332d
active boot switches: -d:release
Nim Compiler Version 2.1.1 [Linux: amd64]
Compiled at 2024-03-09
Copyright (c) 2006-2024 by Andreas Rumpf

git hash: 94c599687796f4ee3872c8aa866827b9ed33f52b
active boot switches: -d:release

Current Output

static N_INLINE(tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw,
                dotdot___stdZenumutils_90)(NI a, NI b) {
  tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw result;
  nimZeroMem((void *)(&result),
             sizeof(tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw));
  nimZeroMem((void *)(&result),
             sizeof(tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw));
  result.a = a;
  result.b = b;
  popFrame();
  return result;
}

Expected Output

No duplicate nimZeroMem() calls

Possible Solution

No response

Additional Information

No response

heterodoxic commented 7 months ago

I assume the disregarded idempotence of the memset to 0 operation, to which nimZeroMem effectively boils down in C/C++, would eventually come at the cost of a redundant mov asm instruction for optimization switches >= /O1 (BTW, wouldn't that be mostly irrespective of tyObject_HSlice__1F9c6PBLtnXQNAmUXyCBSBw's size as soon as it surpasses register capacity, which does yield some additional latency?)? Haven't checked Compiler Explorer, but do compilers in all their matured cleverness actually fail to optimize that seemingly obvious redundancy away for >= /O1?

Still, even from a codegen (build time) and debug mode perspective, I gather there is room for improvement here. If this issue is still up for grabs, I'd be willing to tackle it for my hovering around that area of codegen adventures anyway at the moment. 😉

arnetheduck commented 7 months ago

, would eventually come

zeroMem elision works for simple / small cases where the compiler can prove locality, but not for larger types where it matters more - there are several problems, such as the fact that passing things by pointer to the compiler causes it to become conservative in what is overwritten and what is not since pointers may alias.

As the issue notes, this is a pervasive problem across all kinds of codegen, where zeroMem is sprinkled liberally and redundantly.

there are probably more issues reported already - these apply to ORC/ARC as well btw.