mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
BSD 3-Clause "New" or "Revised" License
583 stars 42 forks source link

intermittent segfault #204

Closed tomas16 closed 10 months ago

tomas16 commented 10 months ago

I'm using the "llvm_ad_rgb" variant of mitsuba. I have some code that's roughly organized as:

def helper(...):
    # a bunch of mitsuba / drjit code
    print("HELPER IS DONE")
    return value

def func(...):
    a = helper(...)
    print("WE MADE IT")

Here's example output:

HELPER IS DONE
jit_eval(): launching 1 kernel.
  -> launching 832020c8aa132839 (n=9216, in=8, out=0, se=1, ops=229, jit=436 us):
     cache hit, load: 1.038 ms, 4.438 KiB.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching e58bde7ab3d38952 (n=49120, in=1, out=1, ops=4, jit=11 us):
     cache hit, load: 719 us, 128 B.
jit_eval(): done.

Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

I'm guessing that some of the code in the helper function is executed at the return statement, given that 2 kernels are launched after the "HELPER IS DONE" message, but before returning to the caller.

When I run this code, what happens is roughly:

This is all without making any changes to the code, simply re-running the same thing.

Versions

I realize this isn't enough information to be able to debug this, so please let me know what I can do on my end. I can't share the full code, but I'm not sure how to create a self-contained reproducible example.

tomas16 commented 10 months ago

Some additional thoughts:

rtabbara commented 10 months ago

Hi @tomas16,

It could be related to the issues you linked, however one other guess is there may be an out-of-bounds write being performed. One common way that this can occur is if you're using dr.scatter with invalid indices. That potentially could explain your non-deterministic behaviour - sometimes an out-of-bounds write is stomping on a valid memory address, other times it could result in a segfault. If a memory stomp is occurring, then that could corrupt neighbouring data resulting in garbage output.

If you're compiling Mitsuba 3 from source, you can enable address sanitizer (using the cmake option DMI_SANITIZE_ADDRESS=ON) and potentially that could give you some leads. But this is all merely speculation, and without a reproducer it's a bit more difficult to provide some concrete advice.

tomas16 commented 10 months ago

Hi @rtabbara, thanks, that was useful feedback.

I can confirm I don't have any out-of-bounds writes on my end. The helper function is a sort of initialization stage of the algorithm, and what it basically does is to create an array of ints with the number of elements equal to the number of faces in the mesh. Then I have some seed points that each have 3D coordinates and a label, and from each seed point I spawn some rays. The output stores for every triangle by which of the seed points it was hit (if it was hit at all). There are no complex indexing operations going on, I'm basically doing something like: dr.scatter(labels, seed_labels, si.prim_index, si.is_valid()) where labels is definitely large enough. I also added an assert to make sure I have no out-of-bounds indices.

I did try address sanitizer and got some interesting results:

PID = 95270
Num threads = 20
2023-12-01 17:04:04 DEBUG main  [PluginManager] Loading plugin "plugins/diffuse.dylib" ..
2023-12-01 17:04:04 DEBUG main  [PluginManager] Loading plugin "plugins/uniform.dylib" ..
jit_eval(): launching 1 kernel.
  -> launching 99b1ca165ba1ffe3 (n=24560, in=5, out=0, se=9, ops=238, jit=5.935 ms):
     cache hit, load: 5.025 ms, 3.124 KiB.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 4539baf240848f31 (n=11968, in=4, out=0, se=3, ops=29, jit=307 us):
     cache hit, load: 566 us, 1.312 KiB.
jit_eval(): done.
Loaded mesh
Num threads = 21
2023-12-01 17:04:05 INFO  main  [Scene] Embree ready. (took 141ms)
jit_eval(): launching 1 kernel.
  -> launching 8431a7ea6fcc3bb5 (n=9216, in=6, out=1, ops=259, jit=2.452 ms):
     cache hit, load: 1.044 ms, 4.875 KiB.
jit_eval(): done.
jit_eval(): launching 1 kernel.
  -> launching 130132f75e13d30c (n=9216, in=9, out=0, se=1, ops=152, jit=8 us):
     cache hit, load: 928 us, 2.776 KiB.
2023-12-01 17:04:05 DEBUG main  [Scene] Free Embree scene state..

=================================================================
jit_eval(): done.
==95270==ERROR: AddressSanitizer: heap-use-after-free on address 0x631000b47460 at pc 0x0001314e5974 bp 0x700018cd9090 sp 0x700018cd9088
READ of size 16 at 0x631000b47460 thread T28
jit_eval(): launching 1 kernel.
  -> launching e58bde7ab3d38952 (n=49120, in=1, out=1, ops=4, jit=39 us):
     cache hit, load: 481 us, 128 B.
jit_eval(): done.
    #0 0x1314e5973 in embree::avx2::BVHNIntersectorKHybrid<4, 8, 1, false, embree::avx2::ArrayIntersectorK_1<8, embree::avx2::TriangleMIntersectorKMoeller<4, 8, false>>, true>::intersect1(embree::Accel::Intersectors*, embree::BVHN<4> const*, embree::NodeRefPtr<4>, unsigned long, embree::avx2::MoellerTrumboreIntersectorK<4, 8>&, embree::RayHitK<8>&, embree::avx2::TravRayK<8, false> const&, embree::IntersectContext*) bvh_intersector_hybrid.cpp:75
    #1 0x131479b28 in embree::avx2::BVHNIntersectorKHybrid<4, 8, 1, false, embree::avx2::ArrayIntersectorK_1<8, embree::avx2::TriangleMIntersectorKMoeller<4, 8, false>>, true>::intersect(embree::vint_impl<8>*, embree::Accel::Intersectors*, embree::RayHitK<8>&, embree::IntersectContext*) bvh_intersector_hybrid.cpp:158
    #2 0x12ebb3e1d in rtcIntersect8 rtcore.cpp:523
    #3 0x124e6059c  (<unknown module>)
    #4 0x1141a7ee3 in jitc_run(ThreadState*, ScheduledGroup)::$_0::operator()(unsigned int, void*) const eval.cpp:509
    #5 0x1141a7bca in jitc_run(ThreadState*, ScheduledGroup)::$_0::__invoke(unsigned int, void*) eval.cpp:495
    #6 0x1132b8fd1 in pool_execute_task(Pool*, bool (*)(void*), void*) nanothread.cpp:296
    #7 0x1132b9dbd in Worker::run() nanothread.cpp:430
    #8 0x1132c1a64 in decltype(*std::declval<Worker*>().*std::declval<void (Worker::*)()>()()) std::__1::__invoke[abi:v15006]<void (Worker::*)(), Worker*, void>(void (Worker::*&&)(), Worker*&&) invoke.h:359
    #9 0x1132c191d in void std::__1::__thread_execute[abi:v15006]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (Worker::*)(), Worker*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (Worker::*)(), Worker*>&, std::__1::__tuple_indices<2ul>) thread:290
    #10 0x1132c0a3f in void* std::__1::__thread_proxy[abi:v15006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (Worker::*)(), Worker*>>(void*) thread:301
    #11 0x7ff8159d41d2 in _pthread_start+0x7c (libsystem_pthread.dylib:x86_64+0x61d2) (BuildId: 86dfa54395fa36b483c6bf03d01b2aad240000001000000000030d0000030d00)
    #12 0x7ff8159cfbd2 in thread_start+0xe (libsystem_pthread.dylib:x86_64+0x1bd2) (BuildId: 86dfa54395fa36b483c6bf03d01b2aad240000001000000000030d0000030d00)
0x631000b47460 is located 27744 bytes inside of 74816-byte region [0x631000b40800,0x631000b52c40)
freed by thread T0 here:
    #0 0x10eaa3ee9 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x48ee9) (BuildId: 756bb7515781379f84412f22c4274ffd2400000010000000000a0a0000030d00)
    #1 0x12ec2946c in embree::FastAllocator::Block::clear_list(embree::MemoryMonitorInterface*) alloc.h:820
    #2 0x12ec28cd5 in embree::FastAllocator::~FastAllocator() alloc.h:200
    #3 0x12f0c1ded in embree::BVHN<4>::~BVHN() bvh.cpp:19
    #4 0x12f16a1a2 in embree::AccelInstance::~AccelInstance() accelinstance.h:11
    #5 0x12eb32b1d in embree::AccelN::~AccelN() acceln.cpp:17
    #6 0x12ecc0c1d in embree::Scene::~Scene() scene.cpp:40
    #7 0x12ebe2553 in rtcReleaseScene rtcore.cpp:960
    #8 0x12174a217 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(unsigned int, int, void*)::operator()(unsigned int, int, void*) const+0x277 (libmitsuba.dylib:x86_64+0x6b5217) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #9 0x121749f90 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(unsigned int, int, void*)::__invoke(unsigned int, int, void*)+0x20 (libmitsuba.dylib:x86_64+0x6b4f90) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #10 0x113fd0241 in jitc_var_free(unsigned int, Variable*) var.cpp:220
    #11 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #12 0x113fd1674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #13 0x113fd0754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #14 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #15 0x113fd1674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #16 0x113fd0754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #17 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #18 0x113fd1674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #19 0x113fd0754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #20 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #21 0x113fd1674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #22 0x113fd0754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #23 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #24 0x113fd1674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #25 0x113fd0754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #26 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
    #27 0x113fd1674 in jitc_var_dec_ref(unsigned int) var.cpp:308
    #28 0x113fd0754 in jitc_var_free(unsigned int, Variable*) var.cpp:253
    #29 0x113fd24d7 in jitc_var_dec_ref(unsigned int, Variable*) var.cpp:302
previously allocated by thread T0 here:
    #0 0x10eaa44b3 in wrap_posix_memalign+0xb3 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x494b3) (BuildId: 756bb7515781379f84412f22c4274ffd2400000010000000000a0a0000030d00)
    #1 0x13254c97b in embree::alignedMalloc(unsigned long, unsigned long) alloc.cpp:21
    #2 0x12ec2bb19 in embree::FastAllocator::Block::create(embree::MemoryMonitorInterface*, unsigned long, unsigned long, embree::FastAllocator::Block*, embree::FastAllocator::AllocationType) alloc.h:783
    #3 0x12ec29e95 in embree::FastAllocator::malloc(unsigned long&, unsigned long, bool) alloc.h:528
    #4 0x1301adb8d in embree::avx::GeneralBVHBuilder::BuilderT<embree::avx::GeneralBVHBuilder::BuildRecordT<embree::avx::PrimInfoExtRange, embree::avx::Split2<embree::avx::BinSplit<32ul>, embree::avx::SpatialBinSplit<16ul>>>, embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>, embree::avx::PrimInfoExtRange, embree::PrimRef, embree::NodeRefPtr<4>, embree::FastAllocator::CachedAllocator, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>>, embree::avx::GeneralBVHBuilder::DefaultCanCreateLeafFunc<embree::PrimRef, embree::avx::PrimInfoExtRange>, embree::avx::GeneralBVHBuilder::DefaultCanCreateLeafSplitFunc<embree::PrimRef, embree::avx::PrimInfoExtRange>, embree::Scene::BuildProgressMonitorInterface>::recurse(embree::avx::GeneralBVHBuilder::BuildRecordT<embree::avx::PrimInfoExtRange, embree::avx::Split2<embree::avx::BinSplit<32ul>, embree::avx::SpatialBinSplit<16ul>>>&, embree::FastAllocator::CachedAllocator, bool) bvh_builder_sah.h:296
    #5 0x1301a6842 in embree::NodeRefPtr<4> embree::avx::GeneralBVHBuilder::build<embree::NodeRefPtr<4>, embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>, embree::avx::PrimInfoExtRange, embree::PrimRef, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>>, embree::Scene::BuildProgressMonitorInterface>(embree::avx::HeuristicArraySpatialSAH<embree::avx::TriangleSplitterFactory, embree::PrimRef, 32ul, 16ul>&, embree::PrimRef*, embree::avx::PrimInfoExtRange const&, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::BVHBuilderBinnedFastSpatialSAH::CreateLeafExt<embree::NodeRefPtr<4>, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>> const&, embree::Scene::BuildProgressMonitorInterface const&, embree::avx::GeneralBVHBuilder::Settings const&) bvh_builder_sah.h:385
    #6 0x13017a074 in embree::NodeRefPtr<4> embree::avx::BVHBuilderBinnedFastSpatialSAH::build<embree::NodeRefPtr<4>, embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>>, embree::avx::TriangleSplitterFactory, embree::Scene::BuildProgressMonitorInterface>(embree::BVHN<4>::CreateAlloc, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Create2, embree::AABBNode_t<embree::NodeRefPtr<4>, 4>::Set2, embree::avx::CreateLeafSpatial<4, embree::TriangleM<4>> const&, embree::avx::TriangleSplitterFactory, embree::Scene::BuildProgressMonitorInterface, embree::PrimRef*, unsigned long, embree::PrimInfoT<embree::BBox<embree::Vec3fa>> const&, embree::avx::GeneralBVHBuilder::Settings const&) bvh_builder_sah.h:609
    #7 0x13017345f in embree::avx::BVHNBuilderFastSpatialSAH<4, embree::TriangleMesh, embree::TriangleM<4>, embree::avx::TriangleSplitterFactory>::build() bvh_builder_sah_spatial.cpp:144
    #8 0x12f16a4f4 in embree::AccelInstance::build() accelinstance.h:23
    #9 0x132562ad0 in embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&) taskschedulerinternal.cpp:53
    #10 0x13256342b in embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*) taskschedulerinternal.cpp:85
    #11 0x13256819c in embree::TaskScheduler::wait() taskschedulerinternal.cpp:323
    #12 0x12eb35e8c in embree::AccelN::accels_build() acceln.cpp:175
    #13 0x12ecce364 in embree::Scene::commit_task() scene.cpp:714
    #14 0x12ecdca42 in embree::TaskScheduler::ClosureTaskFunction<embree::Scene::commit(bool)::$_3>::execute() taskschedulerinternal.h:47
    #15 0x132562ad0 in embree::TaskScheduler::Task::run_internal(embree::TaskScheduler::Thread&) taskschedulerinternal.cpp:53
    #16 0x13256342b in embree::TaskScheduler::TaskQueue::execute_local_internal(embree::TaskScheduler::Thread&, embree::TaskScheduler::Task*) taskschedulerinternal.cpp:85
    #17 0x12eccfd6c in embree::Scene::commit(bool) scene.cpp:794
    #18 0x12eb91429 in rtcJoinCommitScene rtcore.cpp:243
    #19 0x1217f233e in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)::operator()(drjit::blocked_range<unsigned long> const&) const+0x5e (libmitsuba.dylib:x86_64+0x75d33e) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #20 0x1217f2259 in void drjit::parallel_for<unsigned long, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)>(drjit::blocked_range<unsigned long> const&, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)&&, Pool*)::'lambda'(unsigned int, void*)::operator()(unsigned int, void*) const+0x209 (libmitsuba.dylib:x86_64+0x75d259) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #21 0x1217f203a in void drjit::parallel_for<unsigned long, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)>(drjit::blocked_range<unsigned long> const&, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)&&, Pool*)::'lambda'(unsigned int, void*)::__invoke(unsigned int, void*)+0x1a (libmitsuba.dylib:x86_64+0x75d03a) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #22 0x1132b7bac in task_submit_dep nanothread.cpp:177
    #23 0x1217d943c in task_submit(Pool*, unsigned int, void (*)(unsigned int, void*), void*, unsigned int, void (*)(void*), int)+0x5c (libmitsuba.dylib:x86_64+0x74443c) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #24 0x1217d93ba in task_submit_and_wait(Pool*, unsigned int, void (*)(unsigned int, void*), void*)+0x2a (libmitsuba.dylib:x86_64+0x7443ba) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #25 0x121652d4b in void drjit::parallel_for<unsigned long, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)>(drjit::blocked_range<unsigned long> const&, mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()::'lambda'(drjit::blocked_range<unsigned long> const&)&&, Pool*)+0x27b (libmitsuba.dylib:x86_64+0x5bdd4b) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #26 0x121651582 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_parameters_changed_cpu()+0x552 (libmitsuba.dylib:x86_64+0x5bc582) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #27 0x121633efa in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::accel_init_cpu(mitsuba::Properties const&)+0x65a (libmitsuba.dylib:x86_64+0x59eefa) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #28 0x1216325e9 in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::Scene(mitsuba::Properties const&)+0x1179 (libmitsuba.dylib:x86_64+0x59d5e9) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)
    #29 0x1216351ac in mitsuba::Scene<drjit::DiffArray<drjit::LLVMArray<float>>, mitsuba::Color<drjit::DiffArray<drjit::LLVMArray<float>>, 3ul>>::Scene(mitsuba::Properties const&)+0x1c (libmitsuba.dylib:x86_64+0x5a01ac) (BuildId: daf5d67e721b3de2a50c671c4436e4d032000000200000000100000000000d00)

Thread T28 created by T0 here:
    #0 0x10ea9d83c in wrap_pthread_create+0x5c (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x4283c) (BuildId: 756bb7515781379f84412f22c4274ffd2400000010000000000a0a0000030d00)
    #1 0x1132c0918 in std::__1::__libcpp_thread_create[abi:v15006](_opaque_pthread_t**, void* (*)(void*), void*) __threading_support:376
    #2 0x1132c0683 in std::__1::thread::thread<void (Worker::*)(), Worker*, void>(void (Worker::*&&)(), Worker*&&) thread:317
    #3 0x1132b9fb4 in std::__1::thread::thread<void (Worker::*)(), Worker*, void>(void (Worker::*&&)(), Worker*&&) thread:309
    #4 0x1132b9922 in Worker::Worker(Pool*, unsigned int, bool) nanothread.cpp:404
    #5 0x1132b77b1 in Worker::Worker(Pool*, unsigned int, bool) nanothread.cpp:403
    #6 0x1132b7107 in pool_set_size nanothread.cpp:131
    #7 0x11428d866 in jit_llvm_set_thread_count api.cpp:274
    #8 0x119bf29b4 in void pybind11::detail::argument_loader<unsigned int>::call_impl<void, void (*&)(unsigned int), 0ul, pybind11::detail::void_type>(void (*&)(unsigned int), std::__1::integer_sequence<unsigned long, 0ul>, pybind11::detail::void_type&&) && cast.h:1439
    #9 0x119bf24d7 in std::__1::enable_if<std::is_void<void>::value, pybind11::detail::void_type>::type pybind11::detail::argument_loader<unsigned int>::call<void, pybind11::detail::void_type, void (*&)(unsigned int)>(void (*&)(unsigned int)) && cast.h:1413
    #10 0x119bf21a2 in void pybind11::cpp_function::initialize<void (*&)(unsigned int), void, unsigned int, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(unsigned int), void (*)(unsigned int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const pybind11.h:249
    #11 0x119bf1e14 in void pybind11::cpp_function::initialize<void (*&)(unsigned int), void, unsigned int, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(unsigned int), void (*)(unsigned int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) pybind11.h:224
    #12 0x119a60ab7 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) pybind11.h:929
    #13 0x10e3dc297 in cfunction_call+0x37 (python3.10:x86_64+0x1000c4297) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #14 0x10e37c6f7 in _PyObject_MakeTpCall+0x137 (python3.10:x86_64+0x1000646f7) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #15 0x10e4bd4f3 in _PyEval_EvalFrameDefault+0x29ab3 (python3.10:x86_64+0x1001a54f3) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #16 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #17 0x10e4c1106 in PyObject_Vectorcall.4459+0x46 (python3.10:x86_64+0x1001a9106) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #18 0x10e4c181f in call_function+0x6df (python3.10:x86_64+0x1001a981f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #19 0x10e49c63d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #20 0x10e491bff in _PyEval_Vector+0x21f (python3.10:x86_64+0x100179bff) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #21 0x10e48c902 in builtin_exec+0x152 (python3.10:x86_64+0x100174902) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #22 0x10e3dcfc6 in cfunction_vectorcall_FASTCALL+0x66 (python3.10:x86_64+0x1000c4fc6) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #23 0x10e4c17a2 in call_function+0x662 (python3.10:x86_64+0x1001a97a2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #24 0x10e49c63d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #25 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #26 0x10e4c1106 in PyObject_Vectorcall.4459+0x46 (python3.10:x86_64+0x1001a9106) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #27 0x10e4c181f in call_function+0x6df (python3.10:x86_64+0x1001a981f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #28 0x10e49c63d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #29 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #30 0x10e4c1106 in PyObject_Vectorcall.4459+0x46 (python3.10:x86_64+0x1001a9106) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #31 0x10e4c181f in call_function+0x6df (python3.10:x86_64+0x1001a981f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #32 0x10e49c6de in _PyEval_EvalFrameDefault+0x8c9e (python3.10:x86_64+0x1001846de) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #33 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #34 0x10e4c11f2 in call_function+0xb2 (python3.10:x86_64+0x1001a91f2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #35 0x10e49c63d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #36 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #37 0x10e49ece8 in _PyEval_EvalFrameDefault+0xb2a8 (python3.10:x86_64+0x100186ce8) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #38 0x10e491bff in _PyEval_Vector+0x21f (python3.10:x86_64+0x100179bff) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #39 0x10e48c902 in builtin_exec+0x152 (python3.10:x86_64+0x100174902) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #40 0x10e3dcfc6 in cfunction_vectorcall_FASTCALL+0x66 (python3.10:x86_64+0x1000c4fc6) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #41 0x10e4c11f2 in call_function+0xb2 (python3.10:x86_64+0x1001a91f2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #42 0x10e49c63d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #43 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #44 0x10e4c11f2 in call_function+0xb2 (python3.10:x86_64+0x1001a91f2) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #45 0x10e49c63d in _PyEval_EvalFrameDefault+0x8bfd (python3.10:x86_64+0x10018463d) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #46 0x10e37db4f in _PyFunction_Vectorcall+0x22f (python3.10:x86_64+0x100065b4f) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #47 0x10e53abdd in pymain_run_module+0xdd (python3.10:x86_64+0x100222bdd) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #48 0x10e53a6b0 in pymain_run_python+0x1e0 (python3.10:x86_64+0x1002226b0) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #49 0x10e53a484 in Py_RunMain+0x24 (python3.10:x86_64+0x100222484) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #50 0x10e319707 in main+0x37 (python3.10:x86_64+0x100001707) (BuildId: 0d38421cb5bb36118609f1eb115ced8b240000001000000000090a0000000b00)
    #51 0x7ff81567a41e in start+0x76e (dyld:x86_64+0xfffffffffff6e41e) (BuildId: f22a114397323e23a8b7cbade6bb830132000000200000000100000000030d00)
SUMMARY: AddressSanitizer: heap-use-after-free bvh_intersector_hybrid.cpp:75 in embree::avx2::BVHNIntersectorKHybrid<4, 8, 1, false, embree::avx2::ArrayIntersectorK_1<8, embree::avx2::TriangleMIntersectorKMoeller<4, 8, false>>, true>::intersect1(embree::Accel::Intersectors*, embree::BVHN<4> const*, embree::NodeRefPtr<4>, unsigned long, embree::avx2::MoellerTrumboreIntersectorK<4, 8>&, embree::RayHitK<8>&, embree::avx2::TravRayK<8, false> const&, embree::IntersectContext*)
Shadow bytes around the buggy address:
  0x1c6200168e30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168e40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168e50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168e60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168e70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x1c6200168e80: fd fd fd fd fd fd fd fd fd fd fd fd[fd]fd fd fd
  0x1c6200168e90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168ea0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168eb0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168ec0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c6200168ed0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==95270==ABORTING

I'm not 100% sure how to interpret this, but it looks like mitsuba is calling embree for intersection testing while the whole embree acceleration structure has already been deallocated by a different thread. Address sanitizer catches this on every run, so perhaps it's like you said and what's going on is deterministic, but the consequence isn't always a segfault.

I haven't been able to create a reproducible example, because as soon as I do that, the problem disappears. To give people a sense at least, this is a slightly simplified version of the "helper" function:

def initialize_from_seed_labels(mesh, seed_labels, rays_per_seed):
    # The output 'labels' starts by holding the value -1 for every triangle of the mesh. 
    # By the end some of the triangles were hit by rays emanating from our seed coordinates
    # (seed_labels.keys()) and we keep track of which seed (seed_labels.values()) hit each
    # triangle.
    num_triangles = mesh.face_count()
    labels = dr.full(mi.Int, -1, num_triangles)

    # Convert seed_labels to mitsuba types
    scoords = np.asarray(list(seed_labels.keys()))
    scoords = np2point(scoords[:, 0], scoords[:, 1], scoords[:, 2])
    slabels = mi.Int(list(seed_labels.values()))
    num_labels = len(seed_labels)

    # We sample the same directions from each seed
    directions = uniform_sphere_directions(rays_per_seed)
    num_directions = dr.width(directions)

    scoords_r = dr.repeat(scoords, num_directions)
    slabels_r = dr.repeat(slabels, num_directions)
    directions_r = dr.tile(directions, num_labels)
    rays = mi.Ray3f(o=scoords_r, d=directions_r)
    scene = mi.load_dict({'type': 'scene', 'mesh': mesh})
    si = scene.ray_intersect(rays, mi.RayFlags.Empty, coherent=False)
    idx = si.prim_index
    dr.scatter(labels, slabels_r, idx, si.is_valid())

    return labels

As you can see I only have one call to ray_intersect and one to dr.scatter.

What I've noticed is that I get the problem when the structure of the code is something like this:

def helper(...):
    # a bunch of mitsuba / drjit code
    return value

def func(...):
    a = helper(...)
    # more mitsuba code utilizing the variable 'a'

However when I just call the helper from my main function (and call np.asarray() on its output to make sure everything gets evaluated), address sanitizer doesn't report any problem.

It looks like the issue is related to the interaction between different kernels that depend on each other.

Lastly, I've been wondering if there are any flags I could try to potentially alter the behavior and work around the problem for now. I tried dr.set_flag(dr.JitFlag.LaunchBlocking, True) but that resulted in SIGILL.

Any more insights/help would be much appreciated here. I can spend some more time digging, but I probably don't have enough insight into the internals of the system to be able to fix this in a reasonable timeframe by myself.

tomas16 commented 10 months ago

replacing this issue with #208, which is more to the point.