Closed jlowe closed 2 years ago
Yowza. That function should take an explicit stream.
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
Still relevant and would like to see this fixed.
Describe the bug While examining a recent trace I noticed that within the libcudf
aggregate
range there are calls tocudaMalloc
andcudaFree
, the latter which causes a synchronization on the default stream. I attached gdb and put a breakpoint oncudaMalloc
and found it was being triggered bycudf::detail::is_relationally_comparable<cudf::table_device_view>
because it callsthrust::all_of
without passing an execution policy. Without using the RMM policy, it will use the default CUDA allocator. Ideally it should be usingrmm::exec_policy(stream)
but the stream is not available to this method and would need to be passed.Steps/Code to reproduce bug Attach a debugger to a query using the RMM arena allocator and executes an aggregation. Place a breakpoint on
cudaMalloc
and execute the query and observe the breakpoint is hit in a callstack that derives fromcudf::detail::is_relationally_comparable
.Expected behavior libcudf should not trigger calls to
cudaMalloc
orcudaFree
.