nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
352 stars 130 forks source link

[BUG]: DOCA Source Stage CUDA persistent kernel #1557

Open e-ago opened 7 months ago

e-ago commented 7 months ago

Version

24.03

Which installation method(s) does this occur on?

Docker

Describe the bug.

Can't run the DOCA CUDA receiver kernel in persistent mode for a weirdness in the rxcpp/mrc framework. A new thread is scheduled by something in the framwork which creates a context lock contention with the persistent kernel, even if launched on a dedicated context/stream.

Minimum reproducible example

Move the `morpheus::doca::packet_receive_kernel` before the `while (output.is_subscribed())` and run it until the gpu_exit_condition is not 0

Relevant log output

Click here to see error details

 [Paste the error here, it will be hidden by default]

Full env printout

Part of the stack trace generated if I impose a cuContext within, for example, gather_payload() function.

_ZZN8morpheus15DocaSourceStage5buildEvENKUlN5rxcpp10subscriberISt10shared_ptrINS_11MessageMetaEENS1_8observerIS5_vvvvEEEEE_clES8_
_ZNSt17_Function_handlerIFvRN5rxcpp10subscriberISt10shared_ptrIN8morpheus11MessageMetaEENS0_8observerIS5_vvvvEEEEEZNS3_15DocaSourceStage5buildEvEUlS8_E_E9_M_invokeERKSt9_Any_dataS9_
_ZNSt17_Function_handlerIFvN5rxcpp10subscriberISt10shared_ptrIN8morpheus11MessageMetaEENS0_8observerIS5_vvvvEEEEEZNS0_18dynamic_observableIS5_E9constructINS0_7sources6detail6createIS5_ZN3mrc5pymrc12PythonSourceIS5_NSG_8runnable7ContextEEC4ERKSt8functionIFvRS8_EEEUlSN_E_EEEEvOT_ONSD_10tag_sourceEEUlS8_E_E9_M_invokeERKSt9_Any_dataOS8_
rxcpp::detail::safe_subscriber<>::subscribe()
std::_Function_handler<>::_M_invoke()
rxcpp::schedulers::current_thread::current_worker::schedule()
rxcpp::schedulers::worker::schedule<>()
mrc::node::RxSource<>::do_subscribe()
mrc::node::RxRunnable<>::run()
mrc::runnable::RunnableWithContext<>::main()
_ZNSt17_Function_handlerIFvvEZN3mrc8runnable6Runner7enqueueESt10shared_ptrINS2_8IEnginesEEOSt6vectorIS4_INS2_7ContextEESaIS9_EEEUlvE_E9_M_invokeERKSt9_Any_data
_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZNK3mrc6system15ThreadResources11make_threadIN5boost6fibers13packaged_taskIFvvEEEEENS4_6ThreadENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEENS3_6CpuSetEOT_EUlvE_EEEEE6_M_runEv
execute_native_thread_routine

Other/Misc.

No response

Code of Conduct

jarmak-nv commented 7 months ago

Hi @e-ago!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can! In the mean time, feel free to add any relevant information to this issue.

efajardo-nv commented 1 month ago

Hi @e-ago. Can this issue be closed? Not sure if it was resolved by PR #1475.