nim-lang / threading

New atomics, thread primitives, atomic refcounting for --gc:arc/orc.
MIT License
73 stars 12 forks source link

SharedPtr isn't correct? #45

Closed elcritch closed 1 year ago

elcritch commented 1 year ago

I'm not sure but the destructor for SharedPtr[T] seems off:

  proc `=destroy`*[T](p: SharedPtr[T]) =
    if p.val != nil:
      if p.val.counter.load(Acquire) == 0:
        `=destroy`(p.val.value)
        deallocShared(p.val)
      else:
        discard fetchSub(p.val.counter, 1, Release)

Here's an example failure mode I think:

  1. You have three threads.
  2. thrA calls fetchSub setting the count to zero.
  3. Then thrB and thrC both do a load(Acquire) at the same time.
  4. Both thrB and thrC proceed to free the shared pointer.

Using a p.val.counter.fetchSub(1, Release) == 0 seems to work and match various examples. Though you need to set the initial count to 1. The C++ shared_ptr's all appear to use swap or compareSwap though.

Also, I was having failures with fetchSub(1, Aquire) giving me inconsistent results on my M1. From what examples and docs I've you'd need to use fetchSub(1, Release) instead. I haven't read up enough on the memory semantics to confirm it. However, I was looking into it because I kept getting what looked like incorrect atomic counts on my M1. Switching to Release made the issue go away.

Here's what I ended up using in my own implementation:

  if x.container != nil:
    let res = atomicSubFetch(addr x.container.cnt, 1, ATOMIC_RELEASE)
    if res == 0:
      when compiles(`=destroy`(x[])):
        `=destroy`(x[])
      deallocShared(x.container)
Araq commented 1 year ago

Here's an example failure mode I think

But how could 3 threads access the same object with an RC == 1? Your example makes no sense.

elcritch commented 1 year ago

D'oh! That's why I shouldn't try creating an example late at night. I'll try and make another. Or maybe just add a test to verify it works.

elcritch commented 1 year ago

Ok, I re-worked an example and remembered to keep the total count 😆

  1. Threads A & B both use a SharePtr, so count = 2
  2. A & B both hit destructors for their copies
  3. A & B both do p.val.counter.load(Acquire) in parallel
  4. A & B both get count == 2
  5. A & B go on to fetchSub(p.val.counter, 1, Release)
  6. fetchSub will be strictly ordered so count will be 1, then 0
  7. Total count == 0, but neither A & B will free the ptr

So not a double free, but no one ever frees the SharePtr.

Araq commented 1 year ago

Remember the count is stored as "natural count - 1" so if both threads use a sharedptr the count should be 1, not 2. So no wonder it leaks in your scenario as you still don't count like the code does.

elcritch commented 1 year ago

It doesn't matter if it's the natural count - 1, as I used a "logical count". In the actual base they'll both still just get count == 1 and still leak.

Using natural count - 1:

  1. Threads A & B both use a SharePtr, so count = 1
  2. A & B both hit destructors for their copies
  3. A & B both do p.val.counter.load(Acquire) in parallel
  4. A & B both get count == 1
  5. A & B go on to fetchSub(p.val.counter, 1, Release)
  6. fetchSub will be strictly ordered so count will be 1, then -1
  7. Total count == -1, but neither A & B will free the ptr
elcritch commented 1 year ago

Ok, created an example (#46) that on my machine gives a few cases where the SmartPtr isn't freed. You can run it without the os.sleep but it requires higher numbers, but still happens 1 out of a million or so.

└─(15:53:58 on test-smartptrsleak ✭)──> nim c -r "/Users/jaremycreechley/pr
ojs/nims/figuro/vendor/threading/tests/tsmartptrsleak.nim"
Hint: used config file '/Users/jaremycreechley/.asdf/installs/nim/2.0.0/config/nim.cfg' [Conf]
Hint: used config file '/Users/jaremycreechley/.asdf/installs/nim/2.0.0/config/config.nims' [Conf]
Hint: used config file '/Users/jaremycreechley/projs/nims/nim.cfg' [Conf]
Hint: used config file '/Users/jaremycreechley/projs/nims/figuro/nim.cfg' [Conf]
Hint: used config file '/Users/jaremycreechley/projs/nims/figuro/config.nims' [Conf]
Hint: used config file '/Users/jaremycreechley/projs/nims/figuro/vendor/nim.cfg' [Conf]
Hint: used config file '/Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/nim.cfg' [Conf]
Hint: mm: arc; threads: on; opt: none (DEBUG BUILD, `-d:release` generates faster code)
38527 lines; 0.022s; 46.832MiB peakmem; proj: /Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/tsmartptrsleak.nim; out: /Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/tsmartptrsleak [SuccessX]
Hint: /Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/tsmartptrsleak [Exec]
freeCounts: got: 9990 expected: 10000

/Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/tsmartptrsleak.nim(51) tsmartptrsleak
/Users/jaremycreechley/.asdf/installs/nim/2.0.0/lib/std/assertions.nim(41) failedAssertImpl
/Users/jaremycreechley/.asdf/installs/nim/2.0.0/lib/std/assertions.nim(36) raiseAssert
/Users/jaremycreechley/.asdf/installs/nim/2.0.0/lib/system/fatal.nim(53) sysFatal
Error: unhandled exception: /Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/tsmartptrsleak.nim(51, 1) `freeCounts.load(Acquire) == N`  [AssertionDefect]
Error: execution of an external program failed: '/Users/jaremycreechley/projs/nims/figuro/vendor/threading/tests/tsmartptrsleak'
Araq commented 1 year ago

This bug also affects mm:atomicArc!

elcritch commented 1 year ago

This bug also affects mm:atomicArc!

That's a great memory leak to squash!

Should I create a PR with a fix for it?

elcritch commented 1 year ago

Ok, updated my PR to include the fix. I'll checkout the Nim atomicArc too.

edit: Ok jk I'll leave the atomicArc to the compiler folks -- I'd need to understand the "destructorLift" stuff it seems. 😓

ringabout commented 1 year ago

@elcritch hello, is it fixed by https://github.com/nim-lang/threading/pull/46?

elcritch commented 1 year ago

Yep!