ledatelescope / bifrost

A stream processing framework for high-throughput applications.
BSD 3-Clause "New" or "Revised" License
66 stars 29 forks source link

memory problems with CUDA-based rings #138

Open jaycedowell opened 4 years ago

jaycedowell commented 4 years ago

A couple of times now I have run into problems passing data between blocks using CUDA-based rings. If I don't force a bifrost.device.synchronize_stream() within the reserve context for the ring, I end up with inconsistent results reading from the ring in another block. I think what is happening is that the ring doesn't know about the asynchronous copies and happily marks the reserved segment as good to go when then reserve is released. Is there a better way to deal with this than sprinkling synchronize_stream() calls around?

benbarsdell commented 4 years ago

Bifrost asynchronicity is based around CPU threads each having their own CUDA stream. All GPU work in a CPU thread must be synchronous with respect to that thread, so it must be followed by a stream synchronize before things are released to other threads. (Using async CUDA APIs and then synchronizing on a per-CPU-thread stream ensures that GPU work is synchronous within the CPU thread but asynchronous between threads).

E.g., the pipeline infrastructure does this for all blocks here: https://github.com/ledatelescope/bifrost/blob/8a059b3/python/bifrost/pipeline.py#L462

jaycedowell commented 4 years ago

Ok, thanks.