mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
BSD 3-Clause "New" or "Revised" License
563 stars 40 forks source link

[bug] heap-use-after-free / python Scene object freed before kernel runs #208

Closed tomas16 closed 8 months ago

tomas16 commented 9 months ago

This issue replaces #204.

Save the following code as segfault.py:

import drjit as dr
import mitsuba as mi
import numpy as np

def hit_test(mesh):
    sampler: mi.Sampler = mi.load_dict({'type': 'stratified'})
    sampler.seed(0, wavefront_size=1024)
    rays = mi.Ray3f(
        o=[0, 0, 0],
        d=mi.warp.square_to_uniform_sphere(sampler.next_2d())
    )
    scene = mi.load_dict({'type': 'scene', 'mesh': mesh})
    si = scene.ray_intersect(rays, mi.RayFlags.Empty, coherent=False)
    was_hit = dr.zeros(mi.Bool, mesh.face_count())
    assert np.all((si.prim_index < dr.width(was_hit)) | ~si.is_valid())
    dr.scatter(was_hit, True, si.prim_index, si.is_valid())
    return was_hit

def main():
    mi.set_variant("llvm_ad_rgb")
    # dr.set_log_level(dr.LogLevel.Info)
    dr.set_thread_count(1)
    mesh = mi.load_dict({'type': 'cube'})
    was_hit = hit_test(mesh)
    # trigger execution
    was_hit = np.asarray(was_hit)
    print(f"Done - {was_hit.sum()} triangles were intersected")

if __name__ == '__main__':
    main()

Run it with AddressSanitizer:

2023-12-04 14:40:45 DEBUG main  [PluginManager] Loading plugin "plugins/cube.dylib" ..
2023-12-04 14:40:45 DEBUG main  [PluginManager] Loading plugin "plugins/diffuse.dylib" ..
2023-12-04 14:40:45 DEBUG main  [PluginManager] Loading plugin "plugins/uniform.dylib" ..
2023-12-04 14:40:45 DEBUG main  [PluginManager] Loading plugin "plugins/stratified.dylib" ..
2023-12-04 14:40:45 INFO  main  [Scene] Embree ready. (took 0ms)
2023-12-04 14:40:45 DEBUG main  [Scene] Free Embree scene state..
=================================================================

==11247==ERROR: AddressSanitizer: heap-use-after-free on address 0x619002247c20 at pc 0x0001316c09c6 bp 0x700016830f70 sp 0x700016830f68
READ of size 16 at 0x619002247c20 thread T28
    #0 0x1316c09c5 in embree::avx2::BVHNIntersectorKHybrid<4, 8, 1, false, embree::avx2::ArrayIntersectorK_1<8, embree::avx2::TriangleMIntersectorKMoeller<4, 8, false>>, true>::intersect1(embree::Accel::Intersectors*, embree::BVHN<4> const*, embree::NodeRefPtr<4>, unsigned long, embree::avx2::MoellerTrumboreIntersectorK<4, 8>&, embree::RayHitK<8>&, embree::avx2::TravRayK<8, false> const&, embree::IntersectContext*) bvh_intersector_hybrid.cpp:92
    #1 0x131654b28 in embree::avx2::BVHNIntersectorKHybrid<4, 8, 1, false, embree::avx2::ArrayIntersectorK_1<8, embree::avx2::TriangleMIntersectorKMoeller<4, 8, false>>, true>::intersect(embree::vint_impl<8>*, embree::Accel::Intersectors*, embree::RayHitK<8>&, embree::IntersectContext*) bvh_intersector_hybrid.cpp:158
    #2 0x12ed8ee1d in rtcIntersect8 rtcore.cpp:523
    #3 0x1194e3bbe  (<unknown module>)
    #4 0x114382ee3 in jitc_run(ThreadState*, ScheduledGroup)::$_0::operator()(unsigned int, void*) const eval.cpp:509
    #5 0x114382bca in jitc_run(ThreadState*, ScheduledGroup)::$_0::__invoke(unsigned int, void*) eval.cpp:495
    #6 0x113493fd1 in pool_execute_task(Pool*, bool (*)(void*), void*) nanothread.cpp:296
    #7 0x113494dbd in Worker::run() nanothread.cpp:430
    #8 0x11349ca64 in decltype(*std::declval<Worker*>().*std::declval<void (Worker::*)()>()()) std::__1::__invoke[abi:v15006]<void (Worker::*)(), Worker*, void>(void (Worker::*&&)(), Worker*&&) invoke.h:359
    #9 0x11349c91d in void std::__1::__thread_execute[abi:v15006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (Worker::*)(), Worker*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (Worker::*)(), Worker*>&, std::__1::__tuple_indices<2ul>) thread:290
    #10 0x11349ba3f in void* std::__1::__thread_proxy[abi:v15006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (Worker::*)(), Worker*>>(void*) thread:301
    #11 0x7ff8009511d2 in _pthread_start+0x7c (libsystem_pthread.dylib:x86_64+0x61d2) (BuildId: 86dfa54395fa36b483c6bf03d01b2aad240000001000000000030d0000030d00)
    #12 0x7ff80094cbd2 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x1bd2) (BuildId: 86dfa54395fa36b483c6bf03d01b2aad240000001000000000030d0000030d00)
0x619002247c20 is located 416 bytes inside of 1088-byte region [0x619002247a80,0x619002247ec0)
freed by thread T0 here:
    #0 0x10ec9bee9 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x48ee9) (BuildId: 756bb7515781379f84412f22c4274ffd2400000010000000000a0a0000030d00)
    #1 0x12ee0446c in embree::FastAllocator::Block::clear_list(embree::MemoryMonitorInterface*) alloc.h:820
    #2 0x12ee03cd5 in embree::FastAllocator::~FastAllocator() alloc.h:200
    #3 0x12f29cded in embree::BVHN<4>::~BVHN() bvh.cpp:19
    #4 0x12f3451a2 in embree::AccelInstance::~AccelInstance() accelinstance.h:11
    #5 0x12ed0db1d in embree::AccelN::~AccelN() acceln.cpp:17
    #6 0x12ee9bc1d in embree::Scene::~Scene() scene.cpp:40
    #7 0x12edbd553 in rtcReleaseScene rtcore.cpp:960
    #8 0x121925217 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(unsigned int, int, void*)::operator()(unsigned int, int, void*) const+0x277 (libmitsuba.dylib:x86_64+0x6b5217) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #9 0x121924f90 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(unsigned int, int, void*)::__invoke(unsigned int, int, void*)+0x20 (libmitsuba.dylib:x86_64+0x6b4f90) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #10 0x1141ab241 in jitc_var_free(unsigned int, Variable*) var.cpp:220
    #11 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #12 0x1141ac674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #13 0x1141ab754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #14 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #15 0x1141ac674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #16 0x1141ab754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #17 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #18 0x1141ac674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #19 0x1141ab754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #20 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #21 0x1141ac674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #22 0x1141ab754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #23 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #24 0x1141ac674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #25 0x1141ab754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #26 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #27 0x1141ac674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #28 0x1141ab754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #29 0x1141ad4d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
previously allocated by thread T0 here:
    #0 0x10ec9c4b3 in wrap_posix_memalign+0xb3 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x494b3) (BuildId: 756bb7515781379f84412f22c4274ffd2400000010000000000a0a0000030d00)
    #1 0x13272797b in embree::alignedMalloc(unsigned long, unsigned long) alloc.cpp:21
    #2 0x12ee06b19 in embree::FastAllocator::Block::create(embree::MemoryMonitorInterface*, unsigned long, unsigned long, embree::FastAllocator::Block*, embree::FastAllocator::AllocationType) alloc.h:783
    #3 0x12ee04e95 in embree::FastAllocator::malloc(unsigned long&, unsigned long, bool) alloc.h:528
    #4 0x1303a75a3 in embree::NodeRefPtr<4> embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>>::operator()<embree::FastAllocator::CachedAllocator>(embree::PrimRef*, embree::range<unsigned long> const&, embree::FastAllocator::CachedAllocator) const bvh_builder_sah.h:548
    #5 0x13038c479 in embree::avx::GeneralBVHBuilder::BuilderT<embree::avx::GeneralBVHBuilder::BuildRecordT<embree::avx::PrimInfoExtRange, embree::avx::Split2<embree::avx::BinSplit<32ul>, embree::avx::SpatialBinSplit<16ul>>>, embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>, embree::avx::PrimInfoExtRange, embree::PrimRef, embree::NodeRefPtr<4>, embree::FastAllocator::CachedAllocator, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>>, embree::avx::GeneralBVHBuilder::DefaultCanCreateLeafFunc<embree::PrimRef, embree::avx::PrimInfoExtRange>, embree::avx::GeneralBVHBuilder::DefaultCanCreateLeafSplitFunc<embree::PrimRef, embree::avx::PrimInfoExtRange>, embree::Scene::BuildProgressMonitorInterface>::createLargeLeaf(embree::avx::GeneralBVHBuilder::BuildRecordT<embree::avx::PrimInfoExtRange, embree::avx::Split2<embree::avx::BinSplit<32ul>, embree::avx::SpatialBinSplit<16ul>>> const&, embree::FastAllocator::CachedAllocator) bvh_builder_sah.h:163
    #6 0x1303869e3 in embree::avx::GeneralBVHBuilder::BuilderT<embree::avx::GeneralBVHBuilder::BuildRecordT<embree::avx::PrimInfoExtRange, embree::avx::Split2<embree::avx::BinSplit<32ul>, embree::avx::SpatialBinSplit<16ul>>>, embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>, embree::avx::PrimInfoExtRange, embree::PrimRef, embree::NodeRefPtr<4>, embree::FastAllocator::CachedAllocator, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>>, embree::avx::GeneralBVHBuilder::DefaultCanCreateLeafFunc<embree::PrimRef, embree::avx::PrimInfoExtRange>, embree::avx::GeneralBVHBuilder::DefaultCanCreateLeafSplitFunc<embree::PrimRef, embree::avx::PrimInfoExtRange>, embree::Scene::BuildProgressMonitorInterface>::recurse(embree::avx::GeneralBVHBuilder::BuildRecordT<embree::avx::PrimInfoExtRange, embree::avx::Split2<embree::avx::BinSplit<32ul>, embree::avx::SpatialBinSplit<16ul>>>&, embree::FastAllocator::CachedAllocator, bool) bvh_builder_sah.h:243
    #7 0x130381842 in embree::NodeRefPtr<4> embree::avx::GeneralBVHBuilder::build<embree::NodeRefPtr<4>, embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>, embree::avx::PrimInfoExtRange, embree::PrimRef, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>>, embree::Scene::BuildProgressMonitorInterface>(embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>&, embree::PrimRef*, embree::avx::PrimInfoExtRange const&, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>> const&, embree::Scene::BuildProgressMonitorInterface const&, embree::avx::GeneralBVHBuilder::Settings const&) bvh_builder_sah.h:385
    #8 0x130355074 in embree::NodeRefPtr<4> embree::avx::BVHBuilderBinnedFastSpatialSAH::build<embree::NodeRefPtr<4>, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>, embree::avx::TriangleSplitterFactory, embree::Scene::BuildProgressMonitorInterface>(embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>> const&, embree::avx::TriangleSplitterFactory, embree::Scene::BuildProgressMonitorInterface, embree::PrimRef*, unsigned long, embree::PrimInfoT<embree::BBox<embree::Vec3fa>> const&, embree::avx::GeneralBVHBuilder::Settings const&) bvh_builder_sah.h:609
    #9 0x13034e45f in embree::avx::BVHNBuilderFastSpatialSAH<4, embree::TriangleMesh, embree::TriangleM<4>, embree::avx::TriangleSplitterFactory>::build() bvh_builder_sah_spatial.cpp:144
    #10 0x12f3454f4 in embree::AccelInstance::build() accelinstance.h:23
    #11 0x13273dad0 in embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&) taskschedulerinternal.cpp:53
    #12 0x13273e42b in embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*) taskschedulerinternal.cpp:85
    #13 0x13274319c in embree::TaskScheduler::wait() taskschedulerinternal.cpp:323
    #14 0x12ed10e8c in embree::AccelN::accels_build() acceln.cpp:175
    #15 0x12eea9364 in embree::Scene::commit_task() scene.cpp:714
    #16 0x12eeb7a42 in embree::TaskScheduler::ClosureTaskFunction<embree::Scene::commit(bool)::$_3>::execute() taskschedulerinternal.h:47
    #17 0x13273dad0 in embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&) taskschedulerinternal.cpp:53
    #18 0x13273e42b in embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*) taskschedulerinternal.cpp:85
    #19 0x12eeaad6c in embree::Scene::commit(bool) scene.cpp:794
    #20 0x12ed6c429 in rtcJoinCommitScene rtcore.cpp:243
    #21 0x1219cd33e in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)::operator()(drjit::blocked_range<unsigned long> const&) const+0x5e (libmitsuba.dylib:x86_64+0x75d33e) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #22 0x1219cd259 in void drjit::parallel_for<unsigned long, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)>(drjit::blocked_range<unsigned long> const&, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)&&, Pool*)::'lambda'(unsigned int, void*)::operator()(unsigned int, void*) const+0x209 (libmitsuba.dylib:x86_64+0x75d259) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #23 0x1219cd03a in void drjit::parallel_for<unsigned long, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)>(drjit::blocked_range<unsigned long> const&, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)&&, Pool*)::'lambda'(unsigned int, void*)::__invoke(unsigned int, void*)+0x1a (libmitsuba.dylib:x86_64+0x75d03a) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #24 0x113492bac in task_submit_dep nanothread.cpp:177
    #25 0x1219b443c in task_submit(Pool*, unsigned int, void (*)(unsigned int, void*), void*, unsigned int, void (*)(void*), int)+0x5c (libmitsuba.dylib:x86_64+0x74443c) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #26 0x1219b43ba in task_submit_and_wait(Pool*, unsigned int, void (*)(unsigned int, void*), void*)+0x2a (libmitsuba.dylib:x86_64+0x7443ba) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #27 0x12182dd4b in void drjit::parallel_for<unsigned long, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)>(drjit::blocked_range<unsigned long> const&, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)&&, Pool*)+0x27b (libmitsuba.dylib:x86_64+0x5bdd4b) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #28 0x12182c582 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()+0x552 (libmitsuba.dylib:x86_64+0x5bc582) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #29 0x12180eefa in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_init_cpu(mitsuba::Properties const&)+0x65a (libmitsuba.dylib:x86_64+0x59eefa) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)

Thread T28 created by T0 here:
    #0 0x10ec9583c in wrap_pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4283c) (BuildId: 756bb7515781379f84412f22c4274ffd2400000010000000000a0a0000030d00)
    #1 0x11349b918 in std::__1::__libcpp_thread_create[abi:v15006](_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:376
    #2 0x11349b683 in std::__1::thread::thread<void (Worker::*)(), Worker*, void>(void (Worker::*&&)(), Worker*&&) thread:317
    #3 0x113494fb4 in std::__1::thread::thread<void (Worker::*)(), Worker*, void>(void (Worker::*&&)(), Worker*&&) thread:309
    #4 0x113494922 in Worker::Worker(Pool*, unsigned int, bool) nanothread.cpp:404
    #5 0x1134927b1 in Worker::Worker(Pool*, unsigned int, bool) nanothread.cpp:403
    #6 0x113492107 in pool_set_size nanothread.cpp:131
    #7 0x114468866 in jit_llvm_set_thread_count api.cpp:274
    #8 0x119dcd9b4 in void pybind11::detail::argument_loader<unsigned int>::call_impl<void, void (*&)(unsigned int), 0ul, pybind11::detail::void_type>(void (*&)(unsigned int), std::__1::integer_sequence<unsigned long, 0ul>, pybind11::detail::void_type&&) && cast.h:1439
    #9 0x119dcd4d7 in std::__1::enable_if<std::is_void<void>::value, pybind11::detail::void_type>::type pybind11::detail::argument_loader<unsigned int>::call<void, pybind11::detail::void_type, void (*&)(unsigned int)>(void (*&)(unsigned int)) && cast.h:1413
    #10 0x119dcd1a2 in void pybind11::cpp_function::initialize<void (*&)(unsigned int), void, unsigned int, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(unsigned int), void (*)(unsigned int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const pybind11.h:249
    #11 0x119dcce14 in void pybind11::cpp_function::initialize<void (*&)(unsigned int), void, unsigned int, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(unsigned int), void (*)(unsigned int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) pybind11.h:224
    #12 0x119c3bab7 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) pybind11.h:929
    #13 0x10e5d4297 in cfunction_call+0x37 (python3.10:x86_64+0x1000c4297) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #14 0x10e5746f7 in _PyObject_MakeTpCall+0x137 (python3.10:x86_64+0x1000646f7) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #15 0x10e6b54f3 in _PyEval_EvalFrameDefault+0x29ab3 (python3.10:x86_64+0x1001a54f3) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #16 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #17 0x10e6b9106 in PyObject_Vectorcall.4459+0x46 (python3.10:x86_64+0x1001a9106) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #18 0x10e6b981f in call_function+0x6df (python3.10:x86_64+0x1001a981f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #19 0x10e69463d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #20 0x10e689bff in _PyEval_Vector+0x21f (python3.10:x86_64+0x100179bff) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #21 0x10e684902 in builtin_exec+0x152 (python3.10:x86_64+0x100174902) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #22 0x10e5d4fc6 in cfunction_vectorcall_FASTCALL+0x66 (python3.10:x86_64+0x1000c4fc6) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #23 0x10e6b97a2 in call_function+0x662 (python3.10:x86_64+0x1001a97a2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #24 0x10e69463d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #25 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #26 0x10e6b9106 in PyObject_Vectorcall.4459+0x46 (python3.10:x86_64+0x1001a9106) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #27 0x10e6b981f in call_function+0x6df (python3.10:x86_64+0x1001a981f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #28 0x10e69463d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #29 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #30 0x10e6b9106 in PyObject_Vectorcall.4459+0x46 (python3.10:x86_64+0x1001a9106) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #31 0x10e6b981f in call_function+0x6df (python3.10:x86_64+0x1001a981f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #32 0x10e6946de in _PyEval_EvalFrameDefault+0x8c9e (python3.10:x86_64+0x1001846de) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #33 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #34 0x10e6b91f2 in call_function+0xb2 (python3.10:x86_64+0x1001a91f2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #35 0x10e69463d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #36 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #37 0x10e696ce8 in _PyEval_EvalFrameDefault+0xb2a8 (python3.10:x86_64+0x100186ce8) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #38 0x10e689bff in _PyEval_Vector+0x21f (python3.10:x86_64+0x100179bff) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #39 0x10e684902 in builtin_exec+0x152 (python3.10:x86_64+0x100174902) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #40 0x10e5d4fc6 in cfunction_vectorcall_FASTCALL+0x66 (python3.10:x86_64+0x1000c4fc6) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #41 0x10e6b91f2 in call_function+0xb2 (python3.10:x86_64+0x1001a91f2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #42 0x10e69463d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #43 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #44 0x10e6b91f2 in call_function+0xb2 (python3.10:x86_64+0x1001a91f2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #45 0x10e69463d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #46 0x10e575b4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #47 0x10e732bdd in pymain_run_module+0xdd (python3.10:x86_64+0x100222bdd) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #48 0x10e7326b0 in pymain_run_python+0x1e0 (python3.10:x86_64+0x1002226b0) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #49 0x10e732484 in Py_RunMain+0x24 (python3.10:x86_64+0x100222484) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #50 0x10e511707 in main+0x37 (python3.10:x86_64+0x100001707) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #51 0x7ff8005f741e in start+0x76e (dyld:x86_64+0xfffffffffff6e41e) (BuildId: f22a114397323e23a8b7cbade6bb830132000000200000000100000000030d00)
SUMMARY: AddressSanitizer: heap-use-after-free bvh_intersector_hybrid.cpp:92 in embree::avx2::BVHNIntersectorKHybrid<4, 8, 1, false, embree::avx2::ArrayIntersectorK_1<8, embree::avx2::TriangleMIntersectorKMoeller<4, 8, false>>, true>::intersect1(embree::Accel::Intersectors*, embree::BVHN<4> const*, embree::NodeRefPtr<4>, unsigned long, embree::avx2::MoellerTrumboreIntersectorK<4, 8>&, embree::RayHitK<8>&, embree::avx2::TravRayK<8, false> const&, embree::IntersectContext*)
Shadow bytes around the buggy address:
  0x1c3200448f30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c3200448f40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c3200448f50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448f60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448f70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x1c3200448f80: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448f90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448fa0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448fb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448fc0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c3200448fd0: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==11247==ABORTING

It looks like mitsuba is calling embree for intersection testing while the whole embree acceleration structure has already been deallocated by a different thread. Note the heap-use-after-free doesn't always occur on the exact same line, which doesn't matter as the real issue seems to be related to erroneously decrementing the refcount of some (all?) variables.

For smaller wavefront sizes, the heap-use-after-free doesn't always occur.

If you run this without AddressSanitizer, it's likely you will get the correct result. However this is all derived from issue #204, where the behavior is one of (segfault, correct result, incorrect result). In all 3 cases though, AddressSanitizer catches this heap-use-after-free. So while this issue doesn't cause this toy example to fail, it does cause trouble in real code.

Versions

tomas16 commented 9 months ago

I figured out the problem: when scene goes out of scope in the python function hit_test, the underlying C++ objects are cleaned up. However, execution of the kernel is triggered by the np.asarray() call in main, so by the time the kernel runs the memory for scene has been freed.

There are 2 workarounds that I've tested successfully:

I'm not familiar enough with the code to quickly fix this myself. However if kernels keep a reference to their dependencies, it may be a simple matter of incrementing the refcount of the scene object there and decrementing it in the kernel's destructor.

rtabbara commented 9 months ago

Hi @tomas16,

I was able to also get heap-use-after-free errors after enabling Address Sanitizer. Thanks for the reproducer!

There are 2 workarounds that I've tested successfully:

  • Put the np.asarray call inside the hit_test function, so the last line becomes return np.asarray(was_hit)
  • Create the scene object in main and pass it in as an argument to hit_test

That makes sense. I think similarly just adding dr.eval(was_hit) prior to returning from hit_test would also be an alternate workaround. But this is all just masking what I think is a legitimate bug. I think I know what's going on but just want to investigate a bit further to confirm my suspicions.

rtabbara commented 8 months ago

Hi @tomas16 ,

I've pushed a commit into Mitsuba 3 here and confirmed no more ASan errors are reported for the reproducer you've provided.