insight-platform / Savant

Python Computer Vision & Video Analytics Framework With Batteries Included
https://savant-ai.io
Apache License 2.0
543 stars 44 forks source link

PanicException: Failed to install Jaeger tracer globally #503

Closed udayzee05 closed 10 months ago

udayzee05 commented 11 months ago

root@3b9d4277941b:/opt/savant/src# python module/run.py

INFO insight::savant::config::module_config > Configure module... WARN insight::savant::config::json_resolver > JSON loads fail, returning None for "None". WARN insight::savant::config::json_resolver > JSON loads fail, returning None for "None". INFO insight::savant::config::module_config > Configure pipeline elements... INFO insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Path to the model files has been set to "/models/peoplenet". INFO insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Model engine file has been set to "resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine". WARN insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Key for the TAO encoded model (model.tlt_model_key) has been set to "tlt_encode". INFO insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Resulting configuration file "/models/peoplenet/resnet34_peoplenet_pruned_config_savant.txt" has been saved. INFO insight::savant::config::module_config > Pipeline batch size is set to 1. INFO insight::savant::config::module_config > Module configuration is complete. INFO insight::savant::healthcheck::server > Starting healthcheck server at :8888. HTTP path: /healthcheck, status filepath: /opt/savant/status.txt. INFO insight::savant::deepstream::utils::pipeline > Initializing Jaeger tracer with service name 'demo-pipeline' and endpoint 'jaeger:6831'. thread '' panicked at 'Failed to install Jaeger tracer globally: ExportFailed(ConfigError { pipeline_name: "agent", config_name: "endpoint", reason: "failed to lookup address information: Temporary failure in name resolution" })', savant_core/src/telemetry.rs:19:10 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace Traceback (most recent call last): /opt/savant/src/module/run.py:5 main('/opt/savant/src/module/module.yml') /usr/local/lib/python3.8/dist-packages/savant/entrypoint/main.py:71 main pipeline = NvDsPipeline( /usr/local/lib/python3.8/dist-packages/savant/deepstream/pipeline.py:108 init init_telemetry(name, telemetry) /usr/local/lib/python3.8/dist-packages/savant/deepstream/utils/pipeline.py:162 init_telemetry init_jaeger_tracer(service_name, endpoint) PanicException: Failed to install Jaeger tracer globally: ExportFailed(ConfigError { pipeline_name: "agent", config_name: "endpoint", reason: "failed to lookup address information: Temporary failure in name resolution" })

bwsw commented 11 months ago

@udayzee05 hi, please provide more context. What you are launching, etc.

What I see is that you don't have the container with 'jaeger' service name launched, so it cannot resolve the name. However, it is configured in the environment or module.yml.

bwsw commented 11 months ago

This is how it must be: https://github.com/insight-platform/Savant/blob/develop/samples/telemetry/docker-compose.x86.yml

I launched the above pipeline several minutes ago, it works normally.

udayzee05 commented 11 months ago

I was following the instruction from docs trying to lunch module using run.py but facing this issue

bwsw commented 11 months ago

I see, comment telemetry section, lines 11-17 in module.yml

udayzee05 commented 11 months ago

This are the logs after doing changes and running run.py

root@b580e136db32:/opt/savant/src# python module/run.py INFO insight::savant::config::module_config > Configure module... WARN insight::savant::config::json_resolver > JSON loads fail, returning None for "None". WARN insight::savant::config::json_resolver > JSON loads fail, returning None for "None". WARN insight::savant::config::json_resolver > JSON loads fail, returning None for "None". INFO insight::savant::config::module_config > Configure pipeline elements... INFO insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Path to the model files has been set to "/models/peoplenet". INFO insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Model engine file has been set to "resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine". WARN insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Key for the TAO encoded model (model.tlt_model_key) has been set to "tlt_encode". INFO insight::savant::deepstream::nvinfer::element_config > Element nvinfer@detector:v1(name=peoplenet): Resulting configuration file "/models/peoplenet/resnet34_peoplenet_pruned_config_savant.txt" has been saved. INFO insight::savant::config::module_config > Pipeline batch size is set to 1. INFO insight::savant::config::module_config > Module configuration is complete. INFO insight::savant::healthcheck::server > Starting healthcheck server at :8888. HTTP path: /healthcheck, status filepath: /opt/savant/status.txt. INFO insight::savant::deepstream::utils::pipeline > No telemetry provider specified. Using noop tracer. INFO insight::savant::template > Pipeline frame processing parameters: {'width': 1280, 'height': 720, 'batch-size': 1, 'buffer-pool-size': 4, 'batched-push-timeout': 2000, 'live-source': False, 'interpolation-method': 6, 'drop-pipeline-eos': True, 'nvbuf-memory-type': 3}. 127.0.0.1 - - [17/Oct/2023 09:20:03] "GET / HTTP/1.1" 404 - INFO insight::savant::gstreamer::runner > Starting pipeline template<NvDsPipeline>: zeromq_source_bin:v1(name=source) -> nvstreammux:v1(name=muxer) -> nvinfer@detector:v1(name=peoplenet) -> pyfunc:v1(name=pyfunc+gstpluginpyfunc0) -> pyfunc:v1(name=pyfunc+gstpluginpyfunc1) -> nvstreamdemux:v1(name=demuxer)... INFO insight::savant::healthcheck::status > Setting module status to ModuleStatus.STARTING. 127.0.0.1 - - [17/Oct/2023 09:20:04] "GET / HTTP/1.1" 404 - 0:00:07.971333954 1707 0x55b8640 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::deserializeEngineAndBackend() [UID = 1]: deserialized trt engine from :/models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine INFO: ../nvdsinfer/nvdsinfer_model_builder.cpp:610 [Implicit Engine Info]: layers num: 3 0 INPUT kFLOAT input_1 3x544x960
1 OUTPUT kFLOAT output_bbox/BiasAdd 12x34x60
2 OUTPUT kFLOAT output_cov/Sigmoid 3x34x60

0:00:08.033920526 1707 0x55b8640 INFO nvinfer gstnvinfer.cpp:682:gst_nvinfer_logger: NvDsInferContext[UID 1]: Info from NvDsInferContextImpl::generateBackendContext() [UID = 1]: Use deserialized engine model: /models/peoplenet/resnet34_peoplenet_pruned.etlt_b1_gpu0_fp16.engine 0:00:08.038821033 1707 0x55b8640 INFO nvinfer gstnvinfer_impl.cpp:328:notifyLoadModelStatus: [UID 1]: Load new model:/models/peoplenet/resnet34_peoplenet_pruned_config_savant.txt sucessfully INFO insight::savant::utils::zeromq > Starting ZMQ source: socket ipc:///tmp/zmq-sockets/input-video.ipc, type ReceiverSocketTypes.SUB, bind True. INFO insight::savant::gstreamer::runner > The pipeline is initialized and ready to process data. Initialization took 0:00:01.657518. INFO insight::savant::healthcheck::status > Setting module status to ModuleStatus.RUNNING. 127.0.0.1 - - [17/Oct/2023 09:20:10] "GET / HTTP/1.1" 404 - 127.0.0.1 - - [17/Oct/2023 09:20:11] "GET / HTTP/1.1" 404 - ^C INFO insight::savant::main > Shutting down. INFO insight::savant::healthcheck::status > Setting module status to ModuleStatus.STOPPING. INFO insight::savant::zeromq_src::zeromq_src+zeromqsrc0 > Returning INFO insight::savant::utils::zeromq > Closing ZeroMQ socket INFO insight::savant::utils::zeromq > Terminating ZeroMQ context. INFO insight::savant::utils::zeromq > ZeroMQ context terminated INFO insight::savant::gstreamer::runner > The pipeline is about to stop. Operation took 0:00:52.066760. INFO insight::savant::template > Processed 0 frames, 0.00 FPS. INFO insight::savant::healthcheck::status > Setting module status to ModuleStatus.STOPPED. done

bwsw commented 11 months ago

Now everything looks like just fine.

udayzee05 commented 11 months ago

But now without jaeger I am not able to run sample client/run.py scipt

root@b580e136db32:/opt/savant/src# python client/run.py Starting Savant client... thread '' panicked at 'Failed to install Jaeger tracer globally: ExportFailed(ConfigError { pipeline_name: "agent", config_name: "endpoint", reason: "failed to lookup address information: Temporary failure in name resolution" })', savant_core/src/telemetry.rs:19:10 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace Traceback (most recent call last): File "client/run.py", line 15, in init_jaeger_tracer('savant-client', 'jaeger:6831') pyo3_runtime.PanicException: Failed to install Jaeger tracer globally: ExportFailed(ConfigError { pipeline_name: "agent", config_name: "endpoint", reason: "failed to lookup address information: Temporary failure in name resolution" }) root@b580e136db32:/opt/savant/src#

bwsw commented 11 months ago

Hmm, I see. Then you need to start it first.

@abramov-oleg please comment.

abramov-oleg commented 11 months ago

@udayzee05

I was following the instruction from docs trying to lunch module using run.py but facing this issue

Unfortunately the text steps from the linked doc page are a bit outdated.

To learn how to start the telemetry service, check out the Create module project video that's embedded at the end of Reopen in Container section on that page or follow the steps described in the template sample README.

Alternatively, since you've already disabled the telemetry for the module, you can also remove Jaeger usage from the client script:

  1. comment out the init_jaeger_tracer('savant-client', 'jaeger:6831') line
  2. remove the .with_log_provider(JaegerLogProvider(jaeger_endpoint)) instructions from the source and sink build chains

I'll open a ticket to update the doc page.