nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.62k stars 1.47k forks source link

Memory corruption/read from uninitialized memory with `{.noinit.}` in `refc` in Nim 2.0/2.2/`devel`, also triggers `gcc` UBSAN #24332

Open tersec opened 1 month ago

tersec commented 1 month ago

Description

type
  K[T] = object
    case w: bool
    of false:
      discard
    of true:
      p: T

  M[d: static int; T] = object
    e: array[d, T]
    n: array[1024, byte]

  G = object
    v: M[32, array[48, byte]]
    r: K[M[32, array[48, byte]]]

  A = object
    t: seq[byte]

  D = object
    case h: bool
    of false: w: G
    of true:  j: A

func u(f: A): D = D(h: true, j: f)

proc s(): D =
  var c {.noinit.}: D
  c = u(K[A](w: true, p: default(A)).p)
  c

for _ in 0 .. 0:
  let _ = s()

Compile with nim c -r --mm:refc -d:release --passC="-fsanitize=undefined" --passL="-fsanitize=undefined"

In particular, what appears to happen is that the {.noinit.} in a local sense as expected, but what doesn't happen is the assignment overwriting c with a completely initialized object.

From the compiled C, this manifest as this s() proc being translated into:

N_LIB_PRIVATE N_NIMCALL(void,
                        _ZN1s1sE)(tyObject_D__2OtRr1VRc9bvPV1ORq0zkTQ *Result) {
  NI T1_;
  NI T2_;
  NI T3_;
  NI T4_;
  NI T5_;
  NI T6_;
  tyObject_D__2OtRr1VRc9bvPV1ORq0zkTQ c;
  tyObject_K__0J0ZLpQFm9ai7TPbu8HDfTA T7_;
  switch ((*Result).h) {
  case NIM_FALSE:
    T1_ = (NI)0;
    for (T1_ = 0; T1_ < 32; T1_++) {
      T2_ = (NI)0;
      for (T2_ = 0; T2_ < 48; T2_++) {
        (*Result).w.v.e[T1_][T2_] = 0;
      }
    }
    T3_ = (NI)0;
    for (T3_ = 0; T3_ < 1024; T3_++) {
      (*Result).w.v.n[T3_] = 0;
    }
    switch ((*Result).w.r.w) {
    case NIM_FALSE:
      break;
    case NIM_TRUE:
      T4_ = (NI)0;
      for (T4_ = 0; T4_ < 32; T4_++) {
        T5_ = (NI)0;
        for (T5_ = 0; T5_ < 48; T5_++) {
          (*Result).w.r._w_2.p.e[T4_][T5_] = 0;
        }
      }
      T6_ = (NI)0;
      for (T6_ = 0; T6_ < 1024; T6_++) {
        (*Result).w.r._w_2.p.n[T6_] = 0;
      }
      break;
    }
    (*Result).w.r.w = 0;
    break;
  case NIM_TRUE:
    unsureAsgnRef((void **)&(*Result).j.t, NIM_NIL);
    break;
  }
  (*Result).h = 0;
  nimZeroMem((void *)(&T7_), sizeof(tyObject_K__0J0ZLpQFm9ai7TPbu8HDfTA));
  nimZeroMem((void *)(&T7_), sizeof(tyObject_K__0J0ZLpQFm9ai7TPbu8HDfTA));
  T7_.w = NIM_TRUE;
  nimZeroMem((void *)(&T7_._w_2.p), sizeof(tyObject_A__24uBp4OHiFEG2N1MZHIpKg));
  if (!(((2 & ((NU8)1 << ((NU)((T7_.w)) & 7U))) != 0))) {
    raiseFieldError2(((NimStringDesc *)&TM__fXyC5R69caUCFWgALDTmkhQ_5),
                     reprDiscriminant(((NI)(T7_.w)) + (NI)IL64(0),
                                      (&NTIbool__VaVACK0bpYmqIQ0mKcHfQQ_)));
  }
  _ZN1s1uEN1s1AE(T7_._w_2.p, (&c));
  genericAssign((void *)Result, (void *)(&c),
                (&NTId__2OtRr1VRc9bvPV1ORq0zkTQ_));
}

and u() as

N_LIB_PRIVATE
N_NIMCALL(void, _ZN1s1uEN1s1AE)(tyObject_A__24uBp4OHiFEG2N1MZHIpKg f_p0,
                                tyObject_D__2OtRr1VRc9bvPV1ORq0zkTQ *Result) {
  NI T1_;
  NI T2_;
  NI T3_;
  NI T4_;
  NI T5_;
  NI T6_;
  switch ((*Result).h) {
  case NIM_FALSE:
    T1_ = (NI)0;
    for (T1_ = 0; T1_ < 32; T1_++) {
      T2_ = (NI)0;
      for (T2_ = 0; T2_ < 48; T2_++) {
        (*Result).w.v.e[T1_][T2_] = 0;
      }
    }
    T3_ = (NI)0;
    for (T3_ = 0; T3_ < 1024; T3_++) {
      (*Result).w.v.n[T3_] = 0;
    }
    switch ((*Result).w.r.w) {
    case NIM_FALSE:
      break;
    case NIM_TRUE:
      T4_ = (NI)0;
      for (T4_ = 0; T4_ < 32; T4_++) {
        T5_ = (NI)0;
        for (T5_ = 0; T5_ < 48; T5_++) {
          (*Result).w.r._w_2.p.e[T4_][T5_] = 0;
        }
      }
      T6_ = (NI)0;
      for (T6_ = 0; T6_ < 1024; T6_++) {
        (*Result).w.r._w_2.p.n[T6_] = 0;
      }
      break;
    }
    (*Result).w.r.w = 0;
    break;
  case NIM_TRUE:
    unsureAsgnRef((void **)&(*Result).j.t, NIM_NIL);
    break;
  }
  (*Result).h = 0;
  (*Result).h = NIM_TRUE;
  genericSeqAssign((&(*Result).j.t), f_p0.t,
                   (&NTIseqLbyteT__6H5Oh5UUvVCLiakt9aTwtUQ_));
}

s() properly declares

tyObject_D__2OtRr1VRc9bvPV1ORq0zkTQ c;

but indeed does not initialize it. So far, it's correct. The error is that u(), because it uses this

N_LIB_PRIVATE
N_NIMCALL(void, _ZN1s1uEN1s1AE)(tyObject_A__24uBp4OHiFEG2N1MZHIpKg f_p0,
                                tyObject_D__2OtRr1VRc9bvPV1ORq0zkTQ *Result) {

return style where it outputs directly to Result, reads from effectively uninitialized memory when determining what to do with the case object:

switch ((*Result).w.r.w) {

rather than writing to it.

This *Result has never been initialized by s() so it's garbage.

Nim Version

Nim Compiler Version 2.0.10 [Linux: amd64]
Compiled at 2024-10-19
Copyright (c) 2006-2023 by Andreas Rumpf

git hash: e941ee15be775fe3c46db1bed9b4f41c7dfb1334
active boot switches: -d:release
Nim Compiler Version 2.2.0 [Linux: amd64]
Compiled at 2024-10-17
Copyright (c) 2006-2024 by Andreas Rumpf

git hash: 78983f1876726a49c69d65629ab433ea1310ece1
active boot switches: -d:release
Nim Compiler Version 2.2.1 [Linux: amd64]
Compiled at 2024-10-20
Copyright (c) 2006-2024 by Andreas Rumpf

git hash: e69eb99a159859338dbcd9606cf31a6c4700d90b
active boot switches: -d:release

Current Output

.cache/nim/s_r/@ms.nim.c:204:23: runtime error: load of value 82, which is not a valid value for type '_Bool'

(with UBSAN)

Expected Output

No UBSAN issues and no memory corruption

Known Workarounds

No response

Additional Information

Example gdb session (without UBSAN, i.e. it's not a UBSAN artifact, but with --debuginfo:on):

(gdb) b 204
Breakpoint 1 at 0x106da: file $HOME/.cache/nim/s_r/@ms.nim.c, line 204.
(gdb) r
Starting program: /tmp/s 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, _ZN1s1uEN1s1AE (f_p0=..., f_p0@entry=..., Result=Result@entry=0x7fffffffcc00) at $HOME/.cache/nim/s_r/@ms.nim.c:204
204     switch ((*Result).w.r.w) {
(gdb) p Result->w.r.w
$1 = 136

The exact alignment varies a bit depending on how it's run, but this Result->w.r.w is from u() per the description: s.nim.c.txt i.e. it's supposed to be one of NIM_TRUE or NIM_FALSE because it's a bool discriminator for the K[T] type.

Araq commented 1 month ago

It's not the compiler's fault if you get noinit wrong. What can it do about it anyway.

tersec commented 1 month ago

How is this wrong?

{.noinit.} is just conceptually incompatible with case objects, presumably.

arnetheduck commented 1 month ago

What can it do about it anyway.

Notably, this has nothing to do with the case object feature - it applies equally to all types - var x: ref int {.noinit.} is invalid in the exact same way as a case object with a ref member in one of its branches would be

ringabout commented 1 month ago

I cannot reproduce the failure: nim c -r --mm:refc -d:release --passC="-fsanitize=undefined" --passL="-fsanitize=undefined" test23.nim

devel/2.2.0 and gcc 14 on Linux

tersec commented 1 month ago

It depends a lot on the specific constants. The basic issue is still there in the code structure -- it's reading garbage/unitialized memory.

For reference I'm using Debian sid with

gcc (Debian 14.2.0-7) 14.2.0
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Linux version 6.11.2-amd64 (debian-kernel@lists.debian.org) (x86_64-linux-gnu-gcc-14 (Debian 14.2.0-6) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.1) #1 SMP PREEMPT_DYNAMIC Debian 6.11.2-1 (2024-10-05)

The entire type section is there to try to align things in memory for maximum effect, but it's not fundamentally the issue, it only makes it more visible. This is why I included the source code generated -- the combination of {.noinit.} and certain types, including case objects, creates UB.

Araq commented 1 month ago

in the case of a non-trivial type, emit a warning or an error

And then you can disable the warning message effectively introducing .noinit and .reallyNoInit but this is a stupid design as .noinit already means "I know better than the compiler". Sometimes sharp features can cut.