wjakob / nanobind

nanobind: tiny and efficient C++/Python bindings
BSD 3-Clause "New" or "Revised" License
2.33k stars 195 forks source link

[BUG]: Data race reported by TSAN with free-threading #740

Closed vfdev-5 closed 3 weeks ago

vfdev-5 commented 3 weeks ago

Problem description

Hi @wjakob , I still see some data race reports by TSAN sometimes mentioning keep_alive or nb_type_put_common or inst_new_ext. For example, a report I could get using master (c1bab7e4207566b75bbc51c35079f66ae6f0afc0):

WARNING: ThreadSanitizer: data race (pid=217755)
  Read of size 8 at 0x7b8c000001c0 by thread T3:
    #0 tsl::rh::power_of_two_growth_policy<2ul>::bucket_for_hash(unsigned long) const /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_growth_policy.h:125 (example1.cpython-313t-x86_64-linux-gnu.so+0x2759a)
    #1 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::bucket_for_hash(unsigned long) const /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1133 (example1.cpython-313t-x86_64-linux-gnu.so+0x362b5)
    #2 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<true> tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::find_impl<void*>(void* const&, unsigned long) const /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1167 (example1.cpython-313t-x86_64-linux-gnu.so+0x393f6)
    #3 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<true> tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::find<void*>(void* const&, unsigned long) const /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1014 (example1.cpython-313t-x86_64-linux-gnu.so+0x36c9b)
    #4 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<false> tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::find_impl<void*>(void* const&, unsigned long) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1161 (example1.cpython-313t-x86_64-linux-gnu.so+0x35119)
    #5 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<false> tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::find<void*>(void* const&) <null> (example1.cpython-313t-x86_64-linux-gnu.so+0x34719)
    #6 tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::find(void* const&) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_map.h:496 (example1.cpython-313t-x86_64-linux-gnu.so+0x33aaf)
    #7 nanobind::detail::nb_type_put(std::type_info const*, void*, nanobind::rv_policy, nanobind::detail::cleanup_list*, bool*) /usr/local/lib/python3.13t/dist-packages/nanobind/src/nb_type.cpp:1749 (example1.cpython-313t-x86_64-linux-gnu.so+0x2fc4b)
    #8 nanobind::detail::type_caster<std::shared_ptr<SomeClass>, int>::from_cpp(std::shared_ptr<SomeClass> const&, nanobind::rv_policy, nanobind::detail::cleanup_list*) <null> (example1.cpython-313t-x86_64-linux-gnu.so+0x14f96)
    #9 operator() /usr/local/lib/python3.13t/dist-packages/nanobind/include/nanobind/nb_func.h:269 (example1.cpython-313t-x86_64-linux-gnu.so+0x11e6f)
    #10 _FUN /usr/local/lib/python3.13t/dist-packages/nanobind/include/nanobind/nb_func.h:216 (example1.cpython-313t-x86_64-linux-gnu.so+0x11f24)
    #11 nb_func_vectorcall_simple /usr/local/lib/python3.13t/dist-packages/nanobind/src/nb_func.cpp:892 (example1.cpython-313t-x86_64-linux-gnu.so+0x22af5)
    #12 PyObject_Vectorcall <null> (python3.13t+0x487e8c)

  Previous write of size 8 at 0x7b8c000001c0 by thread T1:
    #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<tsl::rh::power_of_two_growth_policy<2ul> > >, std::is_move_constructible<tsl::rh::power_of_two_growth_policy<2ul> >, std::is_move_assignable<tsl::rh::power_of_two_growth_policy<2ul> > >::value, void>::type std::swap<tsl::rh::power_of_two_growth_policy<2ul> >(tsl::rh::power_of_two_growth_policy<2ul>&, tsl::rh::power_of_two_growth_policy<2ul>&) /usr/include/c++/11/bits/move.h:206 (example1.cpython-313t-x86_64-linux-gnu.so+0x3c997)
    #1 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::swap(tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >&) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:932 (example1.cpython-313t-x86_64-linux-gnu.so+0x3af10)
    #2 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::rehash_impl(unsigned long) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1343 (example1.cpython-313t-x86_64-linux-gnu.so+0x38c8f)
    #3 tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::rehash_on_extreme_load(short) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1390 (example1.cpython-313t-x86_64-linux-gnu.so+0x36500)
    #4 std::pair<tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<false>, bool> tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::insert_impl<void*, std::piecewise_construct_t const&, std::tuple<void* const&>, std::tuple<nanobind::detail::nb_inst*&> >(void* const&, std::piecewise_construct_t const&, std::tuple<void* const&>&&, std::tuple<nanobind::detail::nb_inst*&>&&) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:1237 (example1.cpython-313t-x86_64-linux-gnu.so+0x34ec8)
    #5 std::pair<tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<false>, bool> tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::try_emplace<void* const&, nanobind::detail::nb_inst*&>(void* const&, nanobind::detail::nb_inst*&) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_hash.h:809 (example1.cpython-313t-x86_64-linux-gnu.so+0x33cf8)
    #6 std::pair<tsl::detail_robin_hash::robin_hash<std::pair<void*, void*>, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::KeySelect, tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::ValueSelect, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::robin_iterator<false>, bool> tsl::robin_map<void*, void*, nanobind::detail::ptr_hash, std::equal_to<void*>, std::allocator<std::pair<void*, void*> >, false, tsl::rh::power_of_two_growth_policy<2ul> >::try_emplace<nanobind::detail::nb_inst*&>(void* const&, nanobind::detail::nb_inst*&) /usr/local/lib/python3.13t/dist-packages/nanobind/ext/robin_map/include/tsl/robin_map.h:316 (example1.cpython-313t-x86_64-linux-gnu.so+0x330ae)
    #7 nanobind::detail::inst_new_ext(_typeobject*, void*) /usr/local/lib/python3.13t/dist-packages/nanobind/src/nb_type.cpp:172 (example1.cpython-313t-x86_64-linux-gnu.so+0x29b5f)
    #8 nb_type_put_common /usr/local/lib/python3.13t/dist-packages/nanobind/src/nb_type.cpp:1649 (example1.cpython-313t-x86_64-linux-gnu.so+0x2f553)
    #9 nanobind::detail::nb_type_put(std::type_info const*, void*, nanobind::rv_policy, nanobind::detail::cleanup_list*, bool*) /usr/local/lib/python3.13t/dist-packages/nanobind/src/nb_type.cpp:1792 (example1.cpython-313t-x86_64-linux-gnu.so+0x30060)
    #10 nanobind::detail::type_caster<std::shared_ptr<SomeClass>, int>::from_cpp(std::shared_ptr<SomeClass> const&, nanobind::rv_policy, nanobind::detail::cleanup_list*) <null> (example1.cpython-313t-x86_64-linux-gnu.so+0x14f96)
    #11 operator() /usr/local/lib/python3.13t/dist-packages/nanobind/include/nanobind/nb_func.h:269 (example1.cpython-313t-x86_64-linux-gnu.so+0x11e6f)
    #12 _FUN /usr/local/lib/python3.13t/dist-packages/nanobind/include/nanobind/nb_func.h:216 (example1.cpython-313t-x86_64-linux-gnu.so+0x11f24)
    #13 nb_func_vectorcall_simple /usr/local/lib/python3.13t/dist-packages/nanobind/src/nb_func.cpp:892 (example1.cpython-313t-x86_64-linux-gnu.so+0x22af5)
    #14 PyObject_Vectorcall <null> (python3.13t+0x487e8c)
...

I could create a reproducible example below:

C++ extension

#include <nanobind/nanobind.h>
#include "nanobind/stl/shared_ptr.h"  // IWYU pragma: keep
#include "nanobind/stl/string.h"  // IWYU pragma: keep
#include "nanobind/stl/vector.h"  // IWYU pragma: keep
#include <memory>

namespace nb = nanobind;

class SomeClass : public std::enable_shared_from_this<SomeClass> {
public:
    SomeClass() {}
    nb::object call(nb::object obj, nb::args args, nb::kwargs kwargs) {
        return obj;
    }

};

NB_MODULE(example1, m) {
    auto some_class =
        nb::class_<SomeClass>(m, "SomeClass", nb::is_weak_referenceable())
            .def("__call__", &SomeClass::call);

    m.def("some_class", []() { return std::make_shared<SomeClass>(); });
}

CMakeLists.txt:

cmake_minimum_required(VERSION 3.16)

project(example1)

# https://nanobind.readthedocs.io/en/latest/building.html

if (CMAKE_VERSION VERSION_LESS 3.18)
  set(DEV_MODULE Development)
else()
  set(DEV_MODULE Development.Module)
endif()

find_package(Python 3.8 COMPONENTS Interpreter ${DEV_MODULE} REQUIRED)

if (NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
  set(CMAKE_BUILD_TYPE Release CACHE STRING "Choose the type of build." FORCE)
  set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "MinSizeRel" "RelWithDebInfo")
endif()

option(USE_TSAN "Compile with TSAN" OFF)

function(append value)
  foreach(variable ${ARGN})
    set(${variable} "${${variable}} ${value}" PARENT_SCOPE)
  endforeach(variable)
endfunction()

if (USE_TSAN)
  append("-fsanitize=thread" CMAKE_C_FLAGS CMAKE_CXX_FLAGS)
  append("-fsanitize=thread" CMAKE_EXE_LINKER_FLAGS CMAKE_MODULE_LINKER_FLAGS CMAKE_SHARED_LINKER_FLAGS)
endif()

# Detect the installed nanobind package and import it into CMake
execute_process(
  COMMAND "${Python_EXECUTABLE}" -m nanobind --cmake_dir
  OUTPUT_STRIP_TRAILING_WHITESPACE OUTPUT_VARIABLE nanobind_ROOT)

find_package(nanobind CONFIG REQUIRED)

# Compile extension module with size optimization and add 'wrlru_cache_lib'
nanobind_add_module(
    example1
    FREE_THREADED
    binding.cpp
)
set_target_properties(example1 PROPERTIES POSITION_INDEPENDENT_CODE ON)

Can you please confirm the issue and maybe hint what could be the issue with the nanobind or the usage, thanks!

Reproducible example code

# How to build and run:
# mkdir build && cd build
# cmake .. -DCMAKE_BUILD_TYPE=Debug -DPython_EXECUTABLE=/usr/bin/python3.13t -DUSE_TSAN=ON
# cmake --build .
# cd ../
# export TSAN_SYMBOLIZER_PATH=$(which llvm-symbolizer)
# PYTHON_GIL=0 PYTHONPATH=build LD_PRELOAD=/lib/x86_64-linux-gnu/libtsan.so.0 python test.py

def func():
    from example1 import some_class

    class WRKey:
        pass

    c = some_class()
    wrkey = WRKey()
    c(wrkey, 0)

def run_multi_threaded(test_fn, num_workers: int):
    import threading
    import concurrent.futures

    barrier = threading.Barrier(num_workers)

    def closure():
        barrier.wait()
        test_fn()

    with concurrent.futures.ThreadPoolExecutor(
        max_workers=num_workers
    ) as executor:
        futures = []
        for _ in range(num_workers):
            futures.append(executor.submit(closure))
            # We should call future.result() to re-raise an exception if test has
            # failed
        list(f.result() for f in futures)

if __name__ == "__main__":
    # print("-- Exec func()")
    # func()

    print("-- Exec multi_threaded func()")
    run_multi_threaded(func, num_workers=10)
    # for _ in range(10):
        # run_multi_threaded(func, num_workers=10)
wjakob commented 3 weeks ago

Hi @vfdev-5,

I tried your example and also ran the test suite with TSAN. I at first got a huge amount of warnings about races in inst_new_int and inst_new_ext (which makes no sense because these acquire a lock). Switching the associated mutex from a Python mutex to a std::mutex made these go away, which made me suspect that Python itself must be probably built with TSAN as well.

Compiling Python with ./configure --disable-gil --with-thread-sanitizer resolved these warnings for me.

A question to @colesbury: I suspect that TSAN gets confused by the non-standard Python mutex unless both Python and the extension are compiled with TSAN enabled. Is this possible?

wjakob commented 3 weeks ago

@colesbury: While debugging, I also noticed the following:

We currently immortalize type objects with the following assignments

o->ob_tid=_Py_UNOWNED_TID;
o->ob_ref_local=_Py_IMMORTAL_REFCNT_LOCAL;
o->ob_ref_shared = 0;

However, these values don't all stay frozen. In one debugging session, I noticed an immortal type object with an o->ob_ref_shared value as large as 3200000. Is this expected behavior?

This is on Python 3.14 master.

hawkinsp commented 3 weeks ago

I suspect this isn't going to work unless the Python interpreter is built with tsan, given the atomics in the PyMutex implementation won't be subject to tsan instrumentation without it.

I also suspect that PyMutex would probably benefit from tsan annotations: https://github.com/llvm-mirror/compiler-rt/blob/69445f095c22aac2388f939bedebf224a6efcdaf/include/sanitizer/tsan_interface.h#L27 but I don't think that's mandatory.

colesbury commented 3 weeks ago

Yeah, in general you want your whole application compiled with tsan or it will miss synchronizations in the non-instrumented portion. (I don't think the mutex-specific warnings are particularly useful, which is why we don't use them.)

I'm not sure what would cause ob_ref_shared to be in the 3200000 range. Objects that use deferred reference counting have a large constant value (2^62-1, I think) added to ob_ref_shared, but that's much larger than 3200000.

wjakob commented 3 weeks ago

@colesbury : I tracked it to _PyType_MergeThreadLocalRefcounts. It seems to be called even for the immortal objects and increases ob_ref_shared. This is the backtrace:

    frame #1: 0x00000001003350bc python3` _PyType_MergeThreadLocalRefcounts(tstate=0x000000010611c000)  + 340 at typeid.c:159
    frame #2: 0x0000000100335130 python3` _PyType_FinalizeThreadLocalRefcounts(tstate=0x000000010611c000)  + 36 at typeid.c:169
    frame #3: 0x0000000100308444 python3` PyThreadState_Clear(tstate=0x000000010611c000)  + 2044 at pystate.c:1748
    frame #4: 0x00000001003c68dc python3` thread_run(boot_raw=0x0000000104c011a0)  + 468
    frame #5: 0x000000010032df80 python3` pythread_wrapper(arg=0x0000000104a00050)  + 68 at thread_pthread.h:243
    frame #6: 0x0000000100bc45b0 libclang_rt.tsan_osx_dynamic.dylib` __tsan_thread_start_func  + 144
    frame #7: 0x000000018525bfa8 libsystem_pthread.dylib` _pthread_start  + 148
vfdev-5 commented 3 weeks ago

Thanks a lot @wjakob for checking this and sorry for the noise, definitely, compiling python3.13t with TSAN removes a lot of reported data races in the repro code and my other tests.

wjakob commented 3 weeks ago

@vfdev-5 Can the issue be closed? Or do you still see races that could be a problem in nanobind?

colesbury commented 3 weeks ago

It seems to be called even for the immortal objects and increases ob_ref_shared

Ah, that makes sense. That's confusing, but I don't think it really matters. Once PyUnstable_Object_EnableDeferredRefcount is made public, we should update nanobind to use that instead of making heap types immortal (for 3.14+).

vfdev-5 commented 3 weeks ago

@wjakob yes, we can close this issue. No problems from nanobind side.