mitsuba-renderer / drjit

Dr.Jit — A Just-In-Time-Compiler for Differentiable Rendering
BSD 3-Clause "New" or "Revised" License
590 stars 43 forks source link

Random assertion error: queue.cpp:354 #180

Closed sagesimhon closed 1 year ago

sagesimhon commented 1 year ago

Hi,

I am running various rendering jobs via mitsuba 3 and the process throws an assertion error, and a core dump, at random times in my processing stack on a linux machine with a large number of CPUs (this never happens when running identical code on a Mac).

"Assertion failed in /project/ext/drjit-core/ext/nanothread/src/queue.cpp:354: remain == 1"

I have no clue to what the cause of this issues is. Any ideas where to start?

njroussel commented 1 year ago

Hi @sagesimhon

This is an assertion in our thread/worker pool job submission system. It's typically indicative of some race condition in higher-level code. Are you using a vanilla build of Mitsuba/Dr.Jit? These type of issues are hard to track down and fix without a consistent reproducer.

dvicini commented 1 year ago

I also at some point ran into this a few months ago, but I don't recall what caused it or how I got rid of this problem (hopefully?), and I don't have a reproducer either.

sagesimhon commented 1 year ago

yes, it's quite random, it seems to happen more frequently with higher cpus. I am using vanilla build, via conda install. Any suggestions where to start --- is there a way to get more debugging info on the root cause, or at least print the stack trace after the assertion fails? I do eventually see a python core dump message, but have no idea where to find it and how to use it.

isolin commented 1 year ago

Same here. I'm using the pip package of Mitsuba. During an animation it usually takes 500-1000 frames to crash. My machine is a Threadripper with 64 cores. This happens only with Mitsuba 3.3.0 but not with 3.2.1.

sagesimhon commented 1 year ago

tried:

mitsuba: 3.30 drjit: 0.4.2

and mitsuba-3.2.1 drjit-0.4.1

both fail.

with libLLVM-15.so and libLLVM-10.so

njroussel commented 1 year ago

Let me close this, to keep this tidy.

Anyone who finds this issue, we're tracking it over here: https://github.com/mitsuba-renderer/mitsuba3/issues/849