nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
333 stars 124 forks source link

[BUG]: TrainAEStage fails with a Segmentation fault #1641

Open dagardner-nv opened 4 months ago

dagardner-nv commented 4 months ago

Version

24.03

Which installation method(s) does this occur on?

Source

Describe the bug.

The validation script is failing, even though the equivalent unittest is passing.

Minimum reproducible example

./scripts/validation/hammah/val-hammah-all.sh

Relevant log output

Click here to see error details

====Building Segment Complete!====    
Inference Rate: 0 inf [00:00, ? inf/s]PC: @                0x0 (unknown)
*** SIGSEGV (@0x4) received by PID 1751190 (TID 0x7f6207fff6c0) from PID 4; stack trace: ***
    @     0x7f6335d6e197 google::(anonymous namespace)::FailureSignalHandler()
    @     0x7f6336b88050 (unknown)
    @     0x7f632b18da1e boost::fibers::wait_queue::notify_all()
    @     0x7f632b18b3c3 boost::fibers::condition_variable_any::notify_all()
    @     0x7f631ecd9431 _ZN5boost6fibers6detail11task_objectIZN3mrc4core14FiberTaskQueue7enqueueIZNS3_7segment15SegmentInstanceC4ESt10shared_ptrIKNS7_17SegmentDefinitionEEtRNS3_8pipeline17PipelineResourcesEmEUlvE_JEEENS0_6futureINSt9result_ofIFT_DpT0_EE4typeEEEONS3_13FiberMetaDataEOSJ_DpOSK_EUlvE_SaINS0_13packaged_taskIFvvEEEEvJEE3runEv
    @     0x7f631ecea9c6 boost::fibers::worker_context<>::run_()
    @     0x7f631ece86dc boost::context::detail::fiber_entry<>()
    @     0x7f632bffc11f make_fcontext

Full env printout

Click here to see environment details

 [Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

No response

Code of Conduct

dagardner-nv commented 4 months ago

This same bug exists for the pipelines documented in examples/digital_fingerprinting/starter/README.md, problem appears to be in the TrainAEStage