halide / Halide

a language for fast, portable data-parallel computation
https://halide-lang.org
Other
5.84k stars 1.07k forks source link

Profiler-by-timer concurrency bug. #7727

Open mcourteaux opened 1 year ago

mcourteaux commented 1 year ago

I have found several times that my application freezes when I use the builtin profiler-by-timer in Halide.

My last instance was plugged into the debugger, and I analyzed it. It seems that upon the freeze, the loops to iterate over the linked lists do an infinite loop:

https://github.com/halide/Halide/blob/fca8d9606145d0c42caac1954c37d86fa291fea0/src/runtime/profiler_common.cpp#L50-L58

The linked list has a loop in it! For my case, it seems that the length of the loop was 4, yet I have many more pipelines in the application.

I am currently using "profiler_by_timer" as a Generator feature, so I'm not using sampling. I checked the active threads, and all are waiting, and no halide-sampling threads are running. Upon a second run, the loop is of length one (meaning: p->next == p).

Doing some extra debugging, I figured out that the "by-timer" is referring to a hardware timer, rather than what I thought it was: inline instrumentation of the pipeline using RDTSC-like clock measurements.

Doing some more analysis work, I have a hypothesis:

bill_func performs some "bubbling up" (it reorders some elements in the linked list): https://github.com/halide/Halide/blob/fca8d9606145d0c42caac1954c37d86fa291fea0/src/runtime/profiler_common.cpp#L104-L109 From the sampling-thread, this is called while holding a lock, but from the hardware interrupt timer, the interrupt handler profiler_handler() doesn't acquire the lock, but also eventually goes to bill_func (via L34): https://github.com/halide/Halide/blob/fca8d9606145d0c42caac1954c37d86fa291fea0/src/runtime/posix_timer_profiler.cpp#L31-L44 I have no clue if this interrupt timer triggers all cores, or just one. If it does all, then for sure this seems like a bad idea.

This hypothesis is somewhat-ish confirmed by the fact that changing to the profiler feature (as opposed to profiler_by_timer), doesn't trigger this issue.

abadams commented 1 year ago

IIRC, profiler_by_timer was intended for embedded platforms without threading support, which probably explains the thread-unsafety. We should probably throw a spinlock in there.

mcourteaux commented 1 year ago

Ah, okay. I was initially incorrectly assuming that it was gonna use RDTSC to profile the pipeline. Is there a generator feature documentation page somewhere?

abadams commented 1 year ago

The description of each target flag is hiding in HalideRuntime.h. We should probably surface it better:

https://halide-lang.org/docs/_halide_runtime_8h.html#a2ccb96b3d427fff8f1d68cc5f1e92f3a