nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
333 stars 124 forks source link

[BUG]: main CLI crashes around monitor_stage.py (24.03.01 runtime image) #1626

Closed pdmack closed 4 months ago

pdmack commented 4 months ago

Version

24.03.01

Which installation method(s) does this occur on?

Docker, Kubernetes

Describe the bug.

A test pipeline that has worked with previous releases of Morpheus now crashes, possibly at the monitor stage.

Minimum reproducible example

morpheus --log_level=DEBUG run --num_threads=2 --edge_buffer_size=4 --pipeline_batch_size=8196 --model_max_batch_size=32 --use_cpp=True pipeline-nlp --model_seq_length=128 --labels_file=data/labels_phishing.txt from-file --filename=/common/data/email.jsonlines monitor --description 'FromFile Rate' --smoothing=0.001 deserialize preprocess --vocab_hash_file=data/bert-base-uncased-hash.txt --truncation=True --do_lower_case=True --add_special_tokens=False monitor --description 'Preprocess Rate' inf-triton --model_name=phishing-bert-onnx --server_url=ai-engine:8000 --force_convert_inputs=True monitor --description 'Inference Rate' --smoothing=0.001 --unit inf add-class --label=is_phishing --threshold=0.7 serialize to-file --filename=/common/data/output/phishing-bert-onnx-output.jsonlines --overwrite```

Relevant log output

Click here to see error details

 ```
 Parameter, 'labels_file', with relative path, 'data/labels_phishing.txt', does not exist. Using package relative location: '/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/data/labels_phishing.txt'
Configuring Pipeline via CLI
Loaded labels file. Current labels: [['not_phishing', 'is_phishing']]
Module 'FileBatcher' was successfully registered with 'morpheus' namespace.
Module 'FileToDF' was successfully registered with 'morpheus' namespace.
Module 'FilterCmFailed' was successfully registered with 'morpheus' namespace.
Module 'FilterControlMessage' was successfully registered with 'morpheus' namespace.
Module 'FilterDetections' was successfully registered with 'morpheus' namespace.
Module 'FromControlMessage' was successfully registered with 'morpheus' namespace.
Module 'MLFlowModelWriter' was successfully registered with 'morpheus' namespace.
Module 'PayloadBatcher' was successfully registered with 'morpheus' namespace.
Module 'Serialize' was successfully registered with 'morpheus' namespace.
Module 'ToControlMessage' was successfully registered with 'morpheus' namespace.
Module 'WriteToElasticsearch' was successfully registered with 'morpheus' namespace.
Module 'WriteToFile' was successfully registered with 'morpheus' namespace.
Module 'deserialize' was successfully registered with 'morpheus' namespace.
Parameter, 'vocab_hash_file', with relative path, 'data/bert-base-uncased-hash.txt', does not exist. Using package relative location: '/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/data/bert-base-uncased-hash.txt'
====Pipeline Pre-build====
====Pre-Building Segment: linear_segment_0====
====Pre-Building Segment Complete!====
====Pipeline Pre-build Complete!====
====Registering Pipeline====
Starting pipeline via CLI... Ctrl+C to Quit
====Building Pipeline====
====Building Pipeline Complete!====
====Registering Pipeline Complete!====
Config:
{
  "ae": null,
  "class_labels": [
    "not_phishing",
    "is_phishing"
  ],
  "debug": false,
  "edge_buffer_size": 4,
  "feature_length": 128,
  "fil": null,
  "log_config_file": null,
  "log_level": 10,
  "mode": "NLP",
  "model_max_batch_size": 32,
  "num_threads": 2,
  "pipeline_batch_size": 8196,
  "plugins": []
}
E20240415 16:16:06.705006    46 builder_definition.cpp:283] Exception during segment initializer. Segment name: linear_segment_0, Segment Rank: 0. Exception message:
RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr
At:
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
CPP Enabled: True
====Starting Pipeline====
E20240415 16:16:06.706972    46 service.cpp:40] Must call Service::call_in_destructor to ensure service is cleaned up before being destroyed
E20240415 16:16:06.707026    46 controller.cpp:62] exception caught while performing update - this is fatal - issuing kill
====Pipeline Started====
====Building Segment: linear_segment_0====
E20240415 16:16:06.707924    46 context.cpp:124] rank: 0; size: 1; tid: 140427135669824; fid: 0x7fb7b8040f00: set_exception issued; issuing kill to current runnable. Exception msg: RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr
At:
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
E20240415 16:16:06.707991    46 manager.cpp:87] error detected on controller
E20240415 16:16:06.708143    39 runner.cpp:189] Runner::await_join - an exception was caught while awaiting on one or more contexts/instances - rethrowing
Added source: 
  └─> morpheus.MessageMeta
E20240415 16:16:06.708204    39 service.cpp:224] Service[pipeline::Manager]: caught exception in service_await_join: RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr
At:
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
E20240415 16:16:06.708288    39 service.cpp:224] Service[ExecutorDefinition]: caught exception in service_await_join: RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr
At:
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py(131): _build_single
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py(81): _build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(391): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py(413): build
  /opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py(317): inner_build
Added stage: 
  └─ morpheus.MessageMeta -> morpheus.MessageMeta
Module 'deserialize' with namespace 'morpheus' is successfully loaded.
Added stage: , task_type=None, task_payload=None)>
  └─ morpheus.MessageMeta -> morpheus.MultiMessage
Added stage: 
  └─ morpheus.MultiMessage -> morpheus.MultiInferenceMessage
Added stage: 
  └─ morpheus.MultiInferenceMessage -> morpheus.MultiInferenceMessage
Added stage: 
  └─ morpheus.MultiInferenceMessage -> morpheus.MultiResponseMessage
Exception occurred in pipeline. Rethrowing
Traceback (most recent call last):
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 405, in post_start
    await executor.join_async()
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 317, in inner_build
    stage.build(builder)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
    dep.build(builder, do_propagate=do_propagate)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
    dep.build(builder, do_propagate=do_propagate)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
    dep.build(builder, do_propagate=do_propagate)
  [Previous line repeated 3 more times]
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 391, in build
    out_ports_nodes = self._build(builder=builder, input_nodes=in_ports_nodes)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py", line 81, in _build
    return [self._build_single(builder, input_nodes[0])]
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py", line 131, in _build_single
    builder.make_edge(input_node, node)
RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr
Traceback (most recent call last):
  File "/opt/conda/envs/morpheus/bin/morpheus", line 11, in 
    sys.exit(run_cli())
====Pipeline Complete====
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/cli/run.py", line 20, in run_cli
    cli(obj={}, auto_envvar_prefix='MORPHEUS', show_default=True, prog_name="morpheus")
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1720, in invoke
    return _process_result(rv)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 1657, in _process_result
    value = ctx.invoke(self._result_callback, value, **ctx.params)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/cli/commands.py", line 644, in post_pipeline
    pipeline.run()
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 651, in run
    asyncio.run(self.run_async())
  File "/opt/conda/envs/morpheus/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/envs/morpheus/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 632, in run_async
    await self.join()
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 449, in join
    await self._post_start_future
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 405, in post_start
    await executor.join_async()
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/pipeline.py", line 317, in inner_build
    stage.build(builder)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
    dep.build(builder, do_propagate=do_propagate)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
    dep.build(builder, do_propagate=do_propagate)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 413, in build
    dep.build(builder, do_propagate=do_propagate)
  [Previous line repeated 3 more times]
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/stage_base.py", line 391, in build
    out_ports_nodes = self._build(builder=builder, input_nodes=in_ports_nodes)
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/pipeline/single_port_stage.py", line 81, in _build
    return [self._build_single(builder, input_nodes[0])]
  File "/opt/conda/envs/morpheus/lib/python3.10/site-packages/morpheus/stages/general/monitor_stage.py", line 131, in _build_single
    builder.make_edge(input_node, node)
RuntimeError: No conversion found from mrc::pymrc::PyObjectHolder to std::shared_ptr
```

Full env printout

Click here to see environment details

 [Paste the results of print_env.sh here, it will be hidden by default]

Other/Misc.

@drobison00 commented:

The error is caused by trying to create an edge between the generic python wrapper type morpheus uses and a c++ pointer. If this is the real conversion that we want you can add a c++ declaration in messages/module.cpp Something like this, but with MultiResponseMessage:

    mrc::edge::EdgeConnector<std::shared_ptr<morpheus::MessageMeta>, mrc::pymrc::PyObjectHolder>::register_converter();
    mrc::edge::EdgeConnector<mrc::pymrc::PyObjectHolder, std::shared_ptr<morpheus::MessageMeta>>::register_converter();

Code of Conduct