llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.44k stars 12.17k forks source link

std::unique_ptr destructor behaviour differs from (naïve?) expectations #108149

Closed zeule closed 2 months ago

zeule commented 2 months ago

The unique_ptr is widely advertised as a drop-in replacement for raw pointers. In this regard, libc++ implementation of the unique_ptr destructor differs in behaviour from delete <raw-pointer> and what GCC and MSVC libraries do: libc++ calls reset() in the destructor, which first clears the pointer and then calls deleter. Which means that when the destructor of the contained object is executed, the link between the owner of the unique_ptr object and the contained object is broken. That is not the case for raw pointers and unique pointers from GCC and MSVC STLs.

zeule commented 2 months ago

To my understanding of N4950, 3.21 clarifies that 20.3.1.3.3 [unique.ptr.single.dtor] says that the destructor may not call reset(), because the reset() call is not equivalent to if (get()) get_deleter()(get()) as explained in 4.1.2 (there is an observable difference between effects).

zwuis commented 2 months ago
#include <cassert>
#include <memory>

struct S {
  std::unique_ptr<S> &ref;
  S(std::unique_ptr<S> &ref_) : ref(ref_) {}
  ~S() { assert(this == ref.get()); }
};

int main() {
  std::unique_ptr<S> uptr;
  uptr = std::make_unique<S>(uptr);
}

Assertion failed with libc++.

https://godbolt.org/z/eenP1WGnT

MitalAshok commented 2 months ago

[res.on.objects]p2:

If an object of a standard library type is accessed, and [...] the access does not happen before the end of the object's lifetime, the behavior is undefined unless otherwise specified.

[basic.life]p1:

[...] The lifetime of an object o of type T ends when:

  • [...]
  • if T is a class type, the destructor call starts, [...]

There is no well-defined way to observe that the unique_ptr being destroyed is empty while the deleter is being called. @zwuis that ref.get() is UB because ref is no longer in its lifetime.

But you can otherwise observe that ~unique_ptr is incorrectly defined, but only when the deleter has a fancy pointer that isn't trivially default constructible and not trivially copy assignable. A fancy pointer might not appreciate the extra default construct / assign.

zeule commented 2 months ago

Gentlemen,

my question is about the pointer inside the unique_ptr, not the unique_ptr object itself.

zwuis commented 2 months ago

there is an observable difference between effects

This can be optimized.

PS: Sorry for misunderstanding the description.

philnik777 commented 2 months ago

@zeule Could you demonstrate how that is observable?

frederick-vs-ja commented 2 months ago

Hmm, this example can reveal the implementation divergence (Godbolt link):

#include <memory>
#include <print>

template<class T>
struct SideEffectfulPtr {
    SideEffectfulPtr() noexcept {
        std::println("Default construction");
    }
    SideEffectfulPtr(std::nullptr_t) noexcept : SideEffectfulPtr() {}

    explicit SideEffectfulPtr(T* p) noexcept : raw_{p}  {
        std::println("From raw");
    }

    SideEffectfulPtr(const SideEffectfulPtr& other) noexcept : raw_{other.raw_} {
        std::println("Copy construction");
    }

    SideEffectfulPtr& operator=(const SideEffectfulPtr& other) noexcept {
        raw_ = other.raw_;
        std::println("Copy construction");
        return *this;
    }

    ~SideEffectfulPtr() {
        std::println("Destruction");
    }

    T operator*() const noexcept {
        return raw_;
    }

    explicit operator bool() const noexcept {
        return raw_ != nullptr;
    }

    friend bool operator==(const SideEffectfulPtr&, const SideEffectfulPtr&) = default;
    friend bool operator==(const SideEffectfulPtr& lhs, std::nullptr_t) noexcept {
        return lhs.raw_ == nullptr;
    }

    T* raw_{};
};

template<class T>
struct SideEffectfulPtrDeleter {
    using pointer = SideEffectfulPtr<T>;

    void operator()(const pointer& p) const noexcept {
        delete p.raw_;
    }
};

template<class T>
using SideEffectfulUniquePtr = std::unique_ptr<T, SideEffectfulPtrDeleter<T>>;

int main()
{
    SideEffectfulUniquePtr<int> p{};
    std::println("----");
    p.reset();
    std::println("----");
}
Output

libstdc++ ``` Default construction ---- Default construction Copy construction Copy construction Copy construction Destruction Destruction Destruction ---- Default construction Copy construction Destruction Destruction ``` libc++ ``` Default construction ---- Default construction Copy construction Copy construction Destruction Destruction ---- Default construction Copy construction Copy construction Destruction Destruction Destruction ``` MSVC STL ``` Default construction ---- Default construction Copy construction Copy construction Destruction Destruction ---- Destruction ```

No implementation seems strictly conforming here currently.

I guess such implementation divergence should be permitted, but there should be an LWG issue for this.

zeule commented 2 months ago

Thank you, @frederick-vs-ja , for providing the example, which is ways nicer than what I had in mind. Although let me change it a bit, adding move constructor and assignment (Godbolt), so that now the destructor parts of the outputs look like:

libstdc++

Default construction
Move assignment
Destruction
Destruction

libc++

Default construction
Copy construction
Copy assignment
Destruction
Destruction
Destruction

MSVC

Destruction

While I can relate the outputs from libstdc++ and MSVC to if (get()) get_deleter()(get()), it's not the case with libc++.

philnik777 commented 2 months ago

I believe http://eel.is/c++draft/library#structure.requirements-7 covers this. Our implementation has the effect described. That is, it calls the deleter with the pointer if the pointer is non-null. There are a few more side effects, but everything done has to be supported by the pointer type and I don't think we care to optimize this case.

While I can relate the outputs from libstdc++ and MSVC to if (get()) get_deleter()(get()), it's not the case with libc++.

Under your definition of equivalent, how is it allowed to remove side effects but not add side effects?

zeule commented 2 months ago

My basic definition of equivalence is that there is a link between the entity that owns an std::unique_ptr instance and the T instance, which is not to be broken by std::unique_ptr destructor, which is written in 20.3.1.3.3. On the other hand, it is to be broken by the std::unique_ptr::reset(). MSVC STL and libstdc++ respect that, while libc++ does not.

-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() { reset(); }
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 ~unique_ptr() {
+    pointer __tmp  = __ptr_.first();
+    if (__tmp)
+      __ptr_.second()(__tmp);
+  }

The change I ask for is 4 lines long and brings no risk, while this topic is much much longer already.

philnik777 commented 2 months ago

I have honestly no idea what link you're talking about. The compiler is perfectly able to remove any stores from the destructor. Also, while there is probably not much of a risk, there doesn't seem to be any real-world benefit. It's just more code to get wrong.

zeule commented 2 months ago

what link you're talking about.

Given

struct T {
    ~T();
    void f();     
};

struct U {
    void g()
    {
      t_->f(); // (1)
    }
    std::unique_ptr<T> t_;
};

I want (1) to be able to work when U::g() is called from T::~T(). Yes, I know it's a loop, but I don't want to use raw pointer or custom smart pointer or boost::scoped_ptr when the std:: thing should work just fine.

there doesn't seem to be any real-world benefit

There is real-world problem when libc++ behavior differs from the other two for no reason.

zwuis commented 2 months ago

I want (1) to be able to work when U::g() is called from T::~T().

Is T::~T() called from the destructor of std::unique_ptr<T>? If yes, t_-> is UB. Please see https://github.com/llvm/llvm-project/issues/108149#issuecomment-2343252164 for reason.

zeule commented 2 months ago

Yes, I understand that.

philnik777 commented 2 months ago

Given that this seems to catch some UB I'm even more convinced that we don't want to change this. Anything I've seen here so far seems to either not be a use-case that seems to exist in the wild (heavy fancy pointers) or stuff that is outright UB, which we definitely don't want to encourage. Given that, I'm closing this as invalid. If there is anything new to add to this discussion, feel free to comment.