nv-morpheus / Morpheus

Morpheus SDK
Apache License 2.0
310 stars 119 forks source link

[BUG]: Kafka Source stage - Hight CPU usage (>= 100%) - idle consumer #1587

Open nuxwin opened 3 months ago

nuxwin commented 3 months ago

Version

24.3

Which installation method(s) does this occur on?

Docker, Source

Describe the bug.

100% CPU (core) usage while normal CPU usage expected.

Increasing value of the poll_interval doesn't change anything, even when set to 2s.

Minimum reproducible example

The following pipeline allows to reproduce the bug :

#!/opt/conda/envs/morpheus/bin/python

import logging
import click

from morpheus.config import Config
from morpheus.config import CppConfig
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline import LinearPipeline
from morpheus.stages.general.monitor_stage import MonitorStage
from morpheus.stages.input.kafka_source_stage import KafkaSourceStage
from morpheus.utils.logger import configure_logging

logger = logging.getLogger("morpheus.{__name__}")

@click.command()
@click.option(
    "--num_threads",
    default=1,
    type=click.IntRange(min=1),
    help="Number of internal pipeline threads to use.",
)
@click.option(
    "--pipeline_batch_size",
    default=1,
    type=click.IntRange(min=1),
    help=("Internal batch size for the pipeline. Can be much larger than the model batch size. "
          "Also used for Kafka consumers."),
)
@click.option(
    '--bootstrap_servers',
    default='ai-pf-kafka-server:9092',
    help="Comma-separated list of bootstrap servers."
)
@click.option(
    '--input_topic',
    type=str,
    default='ai-pf-input',
    help="Name of the Kafka topic from which messages will be consumed."
)
@click.option(
    '--group_id',
    type=str,
    default='ai-pf',
    help="Kafka input data consumer group identifier."
)
def run_pipeline(num_threads, pipeline_batch_size, bootstrap_servers, input_topic, group_id):
    configure_logging(log_level=logging.DEBUG)

    CppConfig.set_should_use_cpp(False)

    config = Config()
    config.num_threads = num_threads
    config.pipeline_batch_size = pipeline_batch_size

    pipeline = LinearPipeline(config)

    pipeline.set_source(KafkaSourceStage(
        config,
        bootstrap_servers=bootstrap_servers,
        input_topic=input_topic,
        group_id=group_id
    ))

    pipeline.add_stage(MonitorStage(config, description="Source rate"))

    pipeline.run()

if __name__ == "__main__":
    run_pipeline()

Relevant log output

(morpheus) nuxwin@morpheus-konzeptplus:~/projects/git/konzeptplus/nvidia/ai-pf-m-ml-001/.docker$ docker compose run ai-pf-morpheus-pipeline bash
[+] Creating 1/1
 ✔ Container ai-pf-triton-server  Created                                                                                                                                                                                                                                           0.1s 
[+] Running 1/1
 ✔ Container ai-pf-triton-server  Started                                                                                                                                                                                                                                           0.3s               
(morpheus) root@dfaa48a1f8db:/workspace# hight_cpu_usage_pipeline.py 
====Pipeline Pre-build====
====Pre-Building Segment: linear_segment_0====
====Pre-Building Segment Complete!====
====Pipeline Pre-build Complete!====
====Registering Pipeline====
====Building Pipeline====
====Building Pipeline Complete!====
Source rate: 0 messages [00:00, ? messages/s]====Registering Pipeline Complete!====
Source rate: 0 messages [00:00, ? messages/s]====Starting Pipeline====
====Pipeline Started====
====Building Segment: linear_segment_0====
Added source: <from-kafka-0; KafkaSourceStage(bootstrap_servers=ai-pf-kafka-server:9092, input_topic=ai-pf-input, group_id=ai-pf, client_id=None, poll_interval=10millis, disable_commit=False, disable_pre_filtering=False, auto_offset_reset=AutoOffsetReset.LATEST, stop_after=0, async_commits=True)>
  └─> morpheus.MessageMeta
Added stage: <monitor-1; MonitorStage(description=Source rate, smoothing=0.05, unit=messages, delayed_start=False, determine_count_fn=None, log_level=LogLevels.INFO)>
  └─ morpheus.MessageMeta -> morpheus.MessageMeta
====Building Segment Complete!====
Source rate: 0 messages [00:08, ? messages/s]

Full env printout

Click here to see environment details

     **git***
     Not inside a git repository

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=22.04
     DISTRIB_CODENAME=jammy
     DISTRIB_DESCRIPTION="Ubuntu 22.04.4 LTS"
     PRETTY_NAME="Ubuntu 22.04.4 LTS"
     NAME="Ubuntu"
     VERSION_ID="22.04"
     VERSION="22.04.4 LTS (Jammy Jellyfish)"
     VERSION_CODENAME=jammy
     ID=ubuntu
     ID_LIKE=debian
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     UBUNTU_CODENAME=jammy
     Linux dfaa48a1f8db 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Tue Apr  2 05:03:51 2024
     +-----------------------------------------------------------------------------------------+
     | NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
     |-----------------------------------------+------------------------+----------------------+
     | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
     |                                         |                        |               MIG M. |
     |=========================================+========================+======================|
     |   0  NVIDIA GeForce RTX 4070        On  |   00000000:01:00.0 Off |                  N/A |
     |  0%   42C    P8             13W /  200W |     224MiB /  12282MiB |      0%      Default |
     |                                         |                        |                  N/A |
     +-----------------------------------------+------------------------+----------------------+

     +-----------------------------------------------------------------------------------------+
     | Processes:                                                                              |
     |  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
     |        ID   ID                                                               Usage      |
     |=========================================================================================|
     |    0   N/A  N/A    654033      C   tritonserver                                  218MiB |
     +-----------------------------------------------------------------------------------------+

     ***CPU***
     Architecture:                       x86_64
     CPU op-mode(s):                     32-bit, 64-bit
     Address sizes:                      39 bits physical, 48 bits virtual
     Byte Order:                         Little Endian
     CPU(s):                             10
     On-line CPU(s) list:                0-9
     Vendor ID:                          GenuineIntel
     Model name:                         13th Gen Intel(R) Core(TM) i5-13600KF
     CPU family:                         6
     Model:                              183
     Thread(s) per core:                 1
     Core(s) per socket:                 10
     Socket(s):                          1
     Stepping:                           1
     BogoMIPS:                           6988.20
     Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni arat umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
     Virtualization:                     VT-x
     L1d cache:                          320 KiB (10 instances)
     L1i cache:                          320 KiB (10 instances)
     L2 cache:                           40 MiB (10 instances)
     L3 cache:                           16 MiB (1 instance)
     NUMA node(s):                       1
     NUMA node0 CPU(s):                  0-9
     Vulnerability Gather data sampling: Not affected
     Vulnerability Itlb multihit:        Not affected
     Vulnerability L1tf:                 Not affected
     Vulnerability Mds:                  Not affected
     Vulnerability Meltdown:             Not affected
     Vulnerability Mmio stale data:      Not affected
     Vulnerability Retbleed:             Not affected
     Vulnerability Spec rstack overflow: Not affected
     Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
     Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
     Vulnerability Spectre v2:           Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
     Vulnerability Srbds:                Not affected
     Vulnerability Tsx async abort:      Not affected

     ***CMake***

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
     Copyright (C) 2021 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***
     /opt/conda/envs/morpheus/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2023 NVIDIA Corporation
     Built on Mon_Apr__3_17:16:06_PDT_2023
     Cuda compilation tools, release 12.1, V12.1.105
     Build cuda_12.1.r12.1/compiler.32688072_0

     ***Python***
     /opt/conda/envs/morpheus/bin/python
     Python 3.10.14

     ***Environment Variables***
     PATH                            : /opt/conda/envs/morpheus/bin:/opt/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin:/opt/konzeptplus/morpheus/pipelines/inference:/opt/konzeptplus/morpheus/pipelines/mlflow:/opt/konzeptplus/morpheus/pipelines/training
     LD_LIBRARY_PATH                 : /usr/local/nvidia/lib:/usr/local/nvidia/lib64
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /opt/conda/envs/morpheus
     PYTHON_PATH                     :

     ***conda packages***
     /opt/conda/condabin/conda
     # packages in environment at /opt/conda/envs/morpheus:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                 conda_forge    conda-forge
     _openmp_mutex             4.5                       2_gnu    conda-forge
     absl-py                   1.4.0              pyhd8ed1ab_0    conda-forge
     adagio                    0.2.4              pyhd8ed1ab_0    conda-forge
     alabaster                 0.7.16             pyhd8ed1ab_0    conda-forge
     alembic                   1.13.1             pyhd8ed1ab_1    conda-forge
     aniso8601                 9.0.1              pyhd8ed1ab_0    conda-forge
     annotated-types           0.6.0              pyhd8ed1ab_0    conda-forge
     antlr-python-runtime      4.11.1             pyhd8ed1ab_0    conda-forge
     antlr4-python3-runtime    4.11.1             pyh1a96a4e_0    conda-forge
     appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
     argon2-cffi               23.1.0                   pypi_0    pypi
     argon2-cffi-bindings      21.2.0                   pypi_0    pypi
     asn1crypto                1.5.1              pyhd8ed1ab_0    conda-forge
     astunparse                1.6.3                    pypi_0    pypi
     atk-1.0                   2.38.0               hd4edc92_1    conda-forge
     attrs                     23.2.0             pyh71513ae_0    conda-forge
     aws-c-auth                0.7.11               h0b4cabd_1    conda-forge
     aws-c-cal                 0.6.9                h14ec70c_3    conda-forge
     aws-c-common              0.9.12               hd590300_0    conda-forge
     aws-c-compression         0.2.17               h572eabf_8    conda-forge
     aws-c-event-stream        0.4.1                h97bb272_2    conda-forge
     aws-c-http                0.8.0                h9129f04_2    conda-forge
     aws-c-io                  0.14.0               hf8f278a_1    conda-forge
     aws-c-mqtt                0.10.1               h2b97f5f_0    conda-forge
     aws-c-s3                  0.4.9                hca09fc5_0    conda-forge
     aws-c-sdkutils            0.1.13               h572eabf_1    conda-forge
     aws-checksums             0.1.17               h572eabf_7    conda-forge
     aws-crt-cpp               0.26.0               h04327c0_8    conda-forge
     aws-sdk-cpp               1.11.210            hba3e011_10    conda-forge
     babel                     2.14.0             pyhd8ed1ab_0    conda-forge
     betterproto               1.2.5              pyhd3deb0d_1    conda-forge
     blas                      1.0                         mkl    conda-forge
     blinker                   1.7.0              pyhd8ed1ab_0    conda-forge
     bokeh                     3.4.0              pyhd8ed1ab_0    conda-forge
     boost-cpp                 1.84.0               h44aadfe_1    conda-forge
     brotli                    1.1.0                hd590300_1    conda-forge
     brotli-bin                1.1.0                hd590300_1    conda-forge
     brotli-python             1.1.0           py310hc6cd4ac_1    conda-forge
     bzip2                     1.0.8                hd590300_5    conda-forge
     c-ares                    1.27.0               hd590300_0    conda-forge
     ca-certificates           2024.2.2             hbcca054_0    conda-forge
     cachetools                5.3.3              pyhd8ed1ab_0    conda-forge
     cairo                     1.18.0               h3faef2a_0    conda-forge
     cattrs                    23.2.3             pyhd8ed1ab_0    conda-forge
     certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
     cffi                      1.16.0          py310h2fee648_0    conda-forge
     charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
     click                     8.1.7           unix_pyh707e725_0    conda-forge
     cloudpickle               3.0.0              pyhd8ed1ab_0    conda-forge
     colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
     configargparse            1.5.5              pyhd8ed1ab_0    conda-forge
     configparser              6.0.1              pyhd8ed1ab_0    conda-forge
     contourpy                 1.2.0           py310hd41b1e2_0    conda-forge
     cryptography              42.0.5          py310h75e40e8_0    conda-forge
     cuda-cccl_linux-64        12.1.109             ha770c72_0    conda-forge
     cuda-cudart               12.1.105             hd3aeb46_0    conda-forge
     cuda-cudart-dev           12.1.105             hd3aeb46_0    conda-forge
     cuda-cudart-dev_linux-64  12.1.105             h59595ed_0    conda-forge
     cuda-cudart-static        12.1.105             hd3aeb46_0    conda-forge
     cuda-cudart-static_linux-64 12.1.105             h59595ed_0    conda-forge
     cuda-cudart_linux-64      12.1.105             h59595ed_0    conda-forge
     cuda-cupti                12.1.105             h59595ed_0    conda-forge
     cuda-libraries            12.1.0                        0    nvidia
     cuda-nvcc-dev_linux-64    12.1.105             ha770c72_0    conda-forge
     cuda-nvcc-impl            12.1.105             hd3aeb46_0    conda-forge
     cuda-nvcc-tools           12.1.105             hd3aeb46_0    conda-forge
     cuda-nvrtc                12.1.105             hd3aeb46_0    conda-forge
     cuda-nvtx                 12.1.105             h59595ed_0    conda-forge
     cuda-opencl               12.1.105             h59595ed_0    conda-forge
     cuda-python               12.4.0          py310h9f9f131_1    conda-forge
     cuda-runtime              12.1.0                        0    nvidia
     cuda-version              12.1                 h1d6eff3_3    conda-forge
     cudf                      24.02.02        cuda12_py310_240227_gdd34fdbe35_0    rapidsai
     cupy                      13.0.0          py310h7aad9d2_3    conda-forge
     cupy-core                 13.0.0          py310had4011e_3    conda-forge
     cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
     cyrus-sasl                2.1.27               h54b06d7_7    conda-forge
     cytoolz                   0.12.3          py310h2372a71_0    conda-forge
     dask                      2024.2.1           pyhd8ed1ab_0    conda-forge
     dask-core                 2024.2.1           pyhd8ed1ab_0    conda-forge
     dask-cuda                 24.02.00        py310_240212_g96bedbc_0    rapidsai
     dask-cudf                 24.02.02        cuda12_py310_240227_gdd34fdbe35_0    rapidsai
     databricks-cli            0.18.0             pyhd8ed1ab_0    conda-forge
     databricks-connect        14.3.1                   pypi_0    pypi
     databricks-sdk            0.23.0                   pypi_0    pypi
     dataclasses               0.8                pyhc8e2a94_3    conda-forge
     datacompy                 0.10.5             pyhd8ed1ab_0    conda-forge
     dill                      0.3.7              pyhd8ed1ab_0    conda-forge
     distributed               2024.2.1           pyhd8ed1ab_0    conda-forge
     dlpack                    0.5                  h9c3ff4c_0    conda-forge
     docker-py                 5.0.3              pyhd8ed1ab_4    conda-forge
     docker-pycreds            0.4.0                      py_0    conda-forge
     docutils                  0.20.1          py310hff52083_3    conda-forge
     elastic-transport         8.12.0             pyhd8ed1ab_0    conda-forge
     elasticsearch             8.9.0              pyhd8ed1ab_0    conda-forge
     entrypoints               0.4                pyhd8ed1ab_0    conda-forge
     environs                  9.5.0                    pypi_0    pypi
     exceptiongroup            1.2.0              pyhd8ed1ab_2    conda-forge
     expat                     2.6.2                h59595ed_0    conda-forge
     fastrlock                 0.8.2           py310hc6cd4ac_2    conda-forge
     feedparser                6.0.10             pyhd8ed1ab_0    conda-forge
     filelock                  3.13.1             pyhd8ed1ab_0    conda-forge
     flask                     3.0.2              pyhd8ed1ab_0    conda-forge
     flatbuffers               24.3.7                   pypi_0    pypi
     fmt                       10.2.1               h00ab1b0_0    conda-forge
     font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
     font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
     font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
     font-ttf-ubuntu           0.83                 h77eed37_1    conda-forge
     fontconfig                2.14.2               h14ed4e7_0    conda-forge
     fonts-conda-ecosystem     1                             0    conda-forge
     fonts-conda-forge         1                             0    conda-forge
     fonttools                 4.50.0          py310h2372a71_0    conda-forge
     freetype                  2.12.1               h267a509_2    conda-forge
     fribidi                   1.0.10               h36c2ea0_0    conda-forge
     fs                        2.4.16             pyhd8ed1ab_0    conda-forge
     fsspec                    2024.3.1           pyhca7485f_0    conda-forge
     fugue                     0.8.7              pyhd8ed1ab_0    conda-forge
     fugue-sql-antlr           0.2.0              pyhd8ed1ab_0    conda-forge
     gast                      0.5.4                    pypi_0    pypi
     gdk-pixbuf                2.42.10              h829c605_5    conda-forge
     gettext                   0.21.1               h27087fc_0    conda-forge
     gflags                    2.2.2             he1b5a44_1004    conda-forge
     giflib                    5.2.1                h0b41bf4_3    conda-forge
     gitdb                     4.0.11             pyhd8ed1ab_0    conda-forge
     gitpython                 3.1.42             pyhd8ed1ab_0    conda-forge
     glog                      0.6.0                h6f12383_0    conda-forge
     gmock                     1.14.0               ha770c72_1    conda-forge
     gmp                       6.3.0                h59595ed_1    conda-forge
     gmpy2                     2.1.2           py310h3ec546c_1    conda-forge
     google-auth               2.29.0                   pypi_0    pypi
     google-auth-oauthlib      1.2.0                    pypi_0    pypi
     google-pasta              0.2.0                    pypi_0    pypi
     googleapis-common-protos  1.63.0             pyhd8ed1ab_0    conda-forge
     graphene                  3.3                pyhd8ed1ab_0    conda-forge
     graphite2                 1.3.13            h58526e2_1001    conda-forge
     graphql-core              3.2.3              pyhd8ed1ab_0    conda-forge
     graphql-relay             3.2.0              pyhd8ed1ab_0    conda-forge
     graphviz                  9.0.0                h78e8752_1    conda-forge
     greenlet                  3.0.3           py310hc6cd4ac_0    conda-forge
     grpcio                    1.60.0                   pypi_0    pypi
     grpcio-status             1.60.0                   pypi_0    pypi
     grpclib                   0.4.7              pyhd8ed1ab_0    conda-forge
     gtest                     1.14.0               h00ab1b0_1    conda-forge
     gtk2                      2.24.33              h280cfa0_4    conda-forge
     gts                       0.7.6                h977cf35_4    conda-forge
     gunicorn                  21.2.0          py310hff52083_1    conda-forge
     h2                        4.1.0              pyhd8ed1ab_0    conda-forge
     h5py                      3.10.0                   pypi_0    pypi
     harfbuzz                  8.3.0                h3d44ed6_0    conda-forge
     hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
     hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
     icu                       73.2                 h59595ed_0    conda-forge
     idna                      3.6                pyhd8ed1ab_0    conda-forge
     imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
     importlib-metadata        7.1.0              pyha770c72_0    conda-forge
     importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
     importlib_resources       6.3.2              pyhd8ed1ab_0    conda-forge
     intel-openmp              2022.1.0          h9e868ea_3769
     itsdangerous              2.1.2              pyhd8ed1ab_0    conda-forge
     jinja2                    3.1.3              pyhd8ed1ab_0    conda-forge
     joblib                    1.3.2              pyhd8ed1ab_0    conda-forge
     keras                     2.15.0                   pypi_0    pypi
     keyutils                  1.6.1                h166bdaf_0    conda-forge
     kiwisolver                1.4.5           py310hd41b1e2_1    conda-forge
     krb5                      1.21.2               h659d440_0    conda-forge
     lcms2                     2.16                 hb7c19ff_0    conda-forge
     ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
     lerc                      4.0.0                h27087fc_0    conda-forge
     libabseil                 20230802.1      cxx17_h59595ed_0    conda-forge
     libarrow                  14.0.2          h7303f25_3_cuda    conda-forge
     libarrow-acero            14.0.2          h27087fc_3_cuda    conda-forge
     libarrow-dataset          14.0.2          h27087fc_3_cuda    conda-forge
     libarrow-flight           14.0.2          hc63cbfb_3_cuda    conda-forge
     libarrow-flight-sql       14.0.2          hd924b76_3_cuda    conda-forge
     libarrow-gandiva          14.0.2          hfa8be3f_3_cuda    conda-forge
     libarrow-substrait        14.0.2          hd924b76_3_cuda    conda-forge
     libblas                   3.9.0            16_linux64_mkl    conda-forge
     libboost                  1.84.0               h8013b2b_1    conda-forge
     libboost-devel            1.84.0               h00ab1b0_1    conda-forge
     libboost-headers          1.84.0               ha770c72_1    conda-forge
     libbrotlicommon           1.1.0                hd590300_1    conda-forge
     libbrotlidec              1.1.0                hd590300_1    conda-forge
     libbrotlienc              1.1.0                hd590300_1    conda-forge
     libcblas                  3.9.0            16_linux64_mkl    conda-forge
     libclang                  18.1.1                   pypi_0    pypi
     libcrc32c                 1.1.2                h9c3ff4c_0    conda-forge
     libcublas                 12.1.0.26                     0    nvidia
     libcudf                   24.02.02        cuda12_240227_gdd34fdbe35_0    rapidsai
     libcufft                  11.0.2.4                      0    nvidia
     libcufile                 1.6.1.9              hd3aeb46_0    conda-forge
     libcufile-dev             1.6.1.9              hd3aeb46_0    conda-forge
     libcurand                 10.3.2.106           hd3aeb46_0    conda-forge
     libcurl                   8.6.0                hca28451_0    conda-forge
     libcusolver               11.4.4.55                     0    nvidia
     libcusparse               12.0.2.55                     0    nvidia
     libdeflate                1.19                 hd590300_0    conda-forge
     libedit                   3.1.20191231         he28a2e2_2    conda-forge
     libev                     4.33                 hd590300_2    conda-forge
     libevent                  2.1.12               hf998b51_1    conda-forge
     libexpat                  2.6.2                h59595ed_0    conda-forge
     libffi                    3.4.2                h7f98852_5    conda-forge
     libgcc-ng                 13.2.0               h807b86a_5    conda-forge
     libgd                     2.3.3                h119a65a_9    conda-forge
     libgfortran-ng            13.2.0               h69a702a_5    conda-forge
     libgfortran5              13.2.0               ha4646dd_5    conda-forge
     libglib                   2.80.0               hf2295e7_1    conda-forge
     libgomp                   13.2.0               h807b86a_5    conda-forge
     libgoogle-cloud           2.12.0               h5206363_4    conda-forge
     libgrpc                   1.59.3               hd6c4280_0    conda-forge
     libhwloc                  2.9.2           default_h554bfaf_1009    conda-forge
     libiconv                  1.17                 hd590300_2    conda-forge
     libjpeg-turbo             3.0.0                hd590300_1    conda-forge
     libkvikio                 24.02.01        cuda12_240226_gfe01c15_0    rapidsai
     liblapack                 3.9.0            16_linux64_mkl    conda-forge
     libllvm14                 14.0.6               hcd5def8_4    conda-forge
     libllvm15                 15.0.7               hb3ce162_4    conda-forge
     libmrc                    24.03.00a       cuda_12.1_h0dae25b_20    nvidia/label/dev
     libnghttp2                1.58.0               h47da74e_1    conda-forge
     libnl                     3.9.0                hd590300_0    conda-forge
     libnpp                    12.0.2.50                     0    nvidia
     libnsl                    2.0.1                hd590300_0    conda-forge
     libntlm                   1.4               h7f98852_1002    conda-forge
     libnvjitlink              12.1.105             hd3aeb46_0    conda-forge
     libnvjpeg                 12.1.1.14                     0    nvidia
     libopenblas               0.3.26          pthreads_h413a1c8_0    conda-forge
     libparquet                14.0.2          habd00f8_3_cuda    conda-forge
     libpng                    1.6.43               h2797004_0    conda-forge
     libprotobuf               4.24.4               hf27288f_0    conda-forge
     librdkafka                1.9.2                ha5a0de0_2    conda-forge
     libre2-11                 2023.09.01           h7a70373_1    conda-forge
     librmm                    24.02.00             h82930bc_1    conda-forge
     librsvg                   2.56.3               he3f83f7_1    conda-forge
     libsqlite                 3.45.2               h2797004_0    conda-forge
     libssh2                   1.11.0               h0841786_0    conda-forge
     libstdcxx-ng              13.2.0               h7e041cc_5    conda-forge
     libthrift                 0.19.0               hb90f79a_1    conda-forge
     libtiff                   4.6.0                ha9c0a0a_2    conda-forge
     libutf8proc               2.8.0                h166bdaf_0    conda-forge
     libuuid                   2.38.1               h0b41bf4_0    conda-forge
     libuv                     1.48.0               hd590300_0    conda-forge
     libwebp                   1.3.2                h658648e_1    conda-forge
     libwebp-base              1.3.2                hd590300_0    conda-forge
     libxcb                    1.15                 h0b41bf4_0    conda-forge
     libxcrypt                 4.4.36               hd590300_1    conda-forge
     libxml2                   2.12.6               h232c23b_0    conda-forge
     libzlib                   1.2.13               hd590300_5    conda-forge
     llvm-openmp               15.0.7               h0cdce71_0    conda-forge
     llvmlite                  0.42.0          py310h1b8f574_1    conda-forge
     locket                    1.0.0              pyhd8ed1ab_0    conda-forge
     lz4                       4.3.3           py310h350c4a5_0    conda-forge
     lz4-c                     1.9.4                hcb278e6_0    conda-forge
     mako                      1.3.2              pyhd8ed1ab_0    conda-forge
     markdown                  3.6                pyhd8ed1ab_0    conda-forge
     markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
     markupsafe                2.1.5           py310h2372a71_0    conda-forge
     marshmallow               3.21.1                   pypi_0    pypi
     matplotlib-base           3.8.3           py310h62c0568_0    conda-forge
     mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
     merlin-core               23.08.00                   py_0    nvidia
     merlin-dataloader         23.08.00                   py_0    nvidia
     milvus                    2.3.5                    pypi_0    pypi
     minio                     7.2.5                    pypi_0    pypi
     mkl                       2022.1.0           hc2b9512_224
     ml-dtypes                 0.2.0                    pypi_0    pypi
     mlflow                    2.9.2           py310ha13cd29_0    conda-forge
     morpheus                  24.03.00a       cuda_12.1_py3.10_g7bb4ec23_64    file://localhost/opt/conda/conda-bld
     mpc                       1.3.1                hfe3b2da_0    conda-forge
     mpfr                      4.2.1                h9458935_0    conda-forge
     mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
     mrc                       24.03.00a       cuda_12.1_py310_h572eed8_20    nvidia/label/dev
     msgpack-python            1.0.7           py310hd41b1e2_0    conda-forge
     multidict                 6.0.5           py310h2372a71_0    conda-forge
     munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
     ncurses                   6.4.20240210         h59595ed_0    conda-forge
     networkx                  2.8.8              pyhd8ed1ab_0    conda-forge
     nlohmann_json             3.9.1                h9c3ff4c_1    conda-forge
     npy-append-array          0.9.16             pyhd8ed1ab_0    conda-forge
     numba                     0.59.0          py310h7dc5dd1_1    conda-forge
     numpy                     1.24.4          py310ha4c1d20_0    conda-forge
     numpydoc                  1.5.0              pyhd8ed1ab_0    conda-forge
     nvcomp                    3.0.6                h10b603f_0    conda-forge
     nvidia-cublas-cu12        12.2.5.6                 pypi_0    pypi
     nvidia-cuda-cupti-cu12    12.2.142                 pypi_0    pypi
     nvidia-cuda-nvcc-cu12     12.2.140                 pypi_0    pypi
     nvidia-cuda-nvrtc-cu12    12.2.140                 pypi_0    pypi
     nvidia-cuda-runtime-cu12  12.2.140                 pypi_0    pypi
     nvidia-cudnn-cu12         8.9.4.25                 pypi_0    pypi
     nvidia-cufft-cu12         11.0.8.103               pypi_0    pypi
     nvidia-curand-cu12        10.3.3.141               pypi_0    pypi
     nvidia-cusolver-cu12      11.5.2.141               pypi_0    pypi
     nvidia-cusparse-cu12      12.1.2.141               pypi_0    pypi
     nvidia-nccl-cu12          2.16.5                   pypi_0    pypi
     nvidia-nvjitlink-cu12     12.2.140                 pypi_0    pypi
     nvtabular                 23.08.00                py310_0    nvidia
     nvtx                      0.2.10          py310h2372a71_0    conda-forge
     oauthlib                  3.2.2              pyhd8ed1ab_0    conda-forge
     ocl-icd                   2.3.2                hd590300_1    conda-forge
     openjpeg                  2.5.2                h488ebb8_0    conda-forge
     openssl                   3.2.1                hd590300_1    conda-forge
     opt-einsum                3.3.0                    pypi_0    pypi
     orc                       1.9.2                h4b38347_0    conda-forge
     ordered-set               4.1.0              pyhd8ed1ab_0    conda-forge
     packaging                 23.2               pyhd8ed1ab_0    conda-forge
     pandas                    1.5.3           py310h9b08913_1    conda-forge
     pango                     1.52.1               ha41ecd1_0    conda-forge
     partd                     1.4.1              pyhd8ed1ab_0    conda-forge
     pcre2                     10.43                hcad00b1_0    conda-forge
     pillow                    10.2.0          py310h01dd4db_0    conda-forge
     pip                       24.0               pyhd8ed1ab_0    conda-forge
     pixman                    0.43.2               h59595ed_0    conda-forge
     platformdirs              4.2.0              pyhd8ed1ab_0    conda-forge
     pluggy                    1.3.0              pyhd8ed1ab_0    conda-forge
     prometheus_client         0.20.0             pyhd8ed1ab_0    conda-forge
     prometheus_flask_exporter 0.23.0             pyhd8ed1ab_0    conda-forge
     protobuf                  4.24.4          py310h620c231_0    conda-forge
     psutil                    5.9.8           py310h2372a71_0    conda-forge
     pthread-stubs             0.4               h36c2ea0_1001    conda-forge
     py4j                      0.10.9.7                 pypi_0    pypi
     pyarrow                   14.0.2          py310h9a2f4d7_3_cuda    conda-forge
     pyarrow-hotfix            0.6                pyhd8ed1ab_0    conda-forge
     pyasn1                    0.5.1                    pypi_0    pypi
     pyasn1-modules            0.3.0                    pypi_0    pypi
     pycparser                 2.21               pyhd8ed1ab_0    conda-forge
     pycryptodome              3.20.0                   pypi_0    pypi
     pydantic                  2.6.4              pyhd8ed1ab_0    conda-forge
     pydantic-core             2.16.3          py310hcb5633a_0    conda-forge
     pygments                  2.17.2             pyhd8ed1ab_0    conda-forge
     pyjwt                     2.8.0              pyhd8ed1ab_1    conda-forge
     pymilvus                  2.3.6                    pypi_0    pypi
     pynvjitlink               0.1.14          py310hdaa3023_0    rapidsai
     pynvml                    11.4.1             pyhd8ed1ab_0    conda-forge
     pyopenssl                 24.0.0             pyhd8ed1ab_0    conda-forge
     pyparsing                 3.1.2              pyhd8ed1ab_0    conda-forge
     pysocks                   1.7.1              pyha2e5f31_6    conda-forge
     python                    3.10.14         hd12c33a_0_cpython    conda-forge
     python-confluent-kafka    1.9.2           py310h5764c6d_2    conda-forge
     python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
     python-dotenv             1.0.1                    pypi_0    pypi
     python-graphviz           0.20.2             pyh717bed2_0    conda-forge
     python-rapidjson          1.16            py310hc6cd4ac_0    conda-forge
     python_abi                3.10                    4_cp310    conda-forge
     pytorch                   2.2.1           py3.10_cuda12.1_cudnn8.9.2_0    pytorch
     pytorch-cuda              12.1                 ha16c6d3_5    pytorch
     pytorch-mutex             1.0                        cuda    pytorch
     pytz                      2023.4             pyhd8ed1ab_0    conda-forge
     pywin32-on-windows        0.1.0              pyh1179c8e_3    conda-forge
     pyyaml                    6.0.1           py310h2372a71_1    conda-forge
     qpd                       0.4.4              pyhd8ed1ab_1    conda-forge
     querystring_parser        1.2.4                      py_0    conda-forge
     rapids-dask-dependency    24.02.00a11                   0    rapidsai-nightly
     rdma-core                 50.0                 hd3aeb46_1    conda-forge
     re2                       2023.09.01           h7f4b329_1    conda-forge
     readline                  8.2                  h8228510_1    conda-forge
     requests                  2.31.0             pyhd8ed1ab_0    conda-forge
     requests-cache            1.1.1              pyhd8ed1ab_0    conda-forge
     requests-oauthlib         1.4.0                    pypi_0    pypi
     requests-toolbelt         1.0.0              pyhd8ed1ab_0    conda-forge
     rich                      13.7.1             pyhd8ed1ab_0    conda-forge
     rmm                       24.02.00        cuda12_py310_240212_g09b406c1_0    rapidsai
     rsa                       4.9                      pypi_0    pypi
     s2n                       1.4.1                h06160fa_0    conda-forge
     scikit-learn              1.3.2           py310h1fdf081_2    conda-forge
     scipy                     1.12.0          py310hb13e2d6_2    conda-forge
     setuptools                69.2.0             pyhd8ed1ab_0    conda-forge
     sgmllib3k                 1.0.0              pyh9f0ad1d_0    conda-forge
     six                       1.16.0             pyh6c4a22f_0    conda-forge
     sleef                     3.5.1                h9b69904_2    conda-forge
     smmap                     5.0.0              pyhd8ed1ab_0    conda-forge
     snappy                    1.1.10               h9fff704_0    conda-forge
     snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
     sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
     spdlog                    1.12.0               hd2e6256_2    conda-forge
     sphinx                    7.2.6              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-applehelp   1.0.8              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-devhelp     1.0.6              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-htmlhelp    2.0.5              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-jsmath      1.0.1              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-qthelp      1.0.7              pyhd8ed1ab_0    conda-forge
     sphinxcontrib-serializinghtml 1.1.10             pyhd8ed1ab_0    conda-forge
     sqlalchemy                1.4.49          py310h2372a71_1    conda-forge
     sqlglot                   23.0.5             pyhd8ed1ab_0    conda-forge
     sqlparse                  0.4.4              pyhd8ed1ab_0    conda-forge
     stringcase                1.2.0                      py_0    conda-forge
     sympy                     1.12            pypyh9d50eac_103    conda-forge
     tabulate                  0.9.0              pyhd8ed1ab_1    conda-forge
     tblib                     3.0.0              pyhd8ed1ab_0    conda-forge
     tensorboard               2.15.2                   pypi_0    pypi
     tensorboard-data-server   0.7.2                    pypi_0    pypi
     tensorflow                2.15.0.post1             pypi_0    pypi
     tensorflow-estimator      2.15.0                   pypi_0    pypi
     tensorflow-io-gcs-filesystem 0.36.0                   pypi_0    pypi
     tensorflow-metadata       1.13.1             pyhd8ed1ab_0    conda-forge
     termcolor                 2.4.0                    pypi_0    pypi
     threadpoolctl             3.4.0              pyhc1e730c_0    conda-forge
     tk                        8.6.13          noxft_h4845f30_101    conda-forge
     toolz                     0.12.1             pyhd8ed1ab_0    conda-forge
     torchtriton               2.2.0                     py310    pytorch
     tornado                   6.4             py310h2372a71_0    conda-forge
     tqdm                      4.66.2             pyhd8ed1ab_0    conda-forge
     triad                     0.9.6              pyhd8ed1ab_0    conda-forge
     tritonclient              2.34.0             pyhff2d567_0    conda-forge
     typing-extensions         4.10.0               hd8ed1ab_0    conda-forge
     typing_extensions         4.10.0             pyha770c72_0    conda-forge
     typing_utils              0.1.0              pyhd8ed1ab_0    conda-forge
     tzdata                    2024a                h0c530f3_0    conda-forge
     ucx                       1.15.0               hf604dca_7    conda-forge
     ujson                     5.9.0           py310hc6cd4ac_0    conda-forge
     unicodedata2              15.1.0          py310h2372a71_0    conda-forge
     url-normalize             1.4.3              pyhd8ed1ab_0    conda-forge
     urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
     watchdog                  3.0.0           py310hff52083_1    conda-forge
     websocket-client          1.7.0              pyhd8ed1ab_0    conda-forge
     websockets                12.0            py310h2372a71_0    conda-forge
     werkzeug                  3.0.1              pyhd8ed1ab_0    conda-forge
     wheel                     0.42.0             pyhd8ed1ab_0    conda-forge
     wrapt                     1.14.1                   pypi_0    pypi
     xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
     xorg-libice               1.1.1                hd590300_0    conda-forge
     xorg-libsm                1.2.4                h7391055_0    conda-forge
     xorg-libx11               1.8.7                h8ee46fc_0    conda-forge
     xorg-libxau               1.0.11               hd590300_0    conda-forge
     xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
     xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
     xorg-libxrender           0.9.11               hd590300_0    conda-forge
     xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
     xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
     xorg-xproto               7.0.31            h7f98852_1007    conda-forge
     xyzservices               2023.10.1          pyhd8ed1ab_0    conda-forge
     xz                        5.2.6                h166bdaf_0    conda-forge
     yaml                      0.2.5                h7f98852_2    conda-forge
     zict                      3.0.0              pyhd8ed1ab_0    conda-forge
     zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
     zlib                      1.2.13               hd590300_5    conda-forge
     zstd                      1.5.5                hfc55251_0    conda-forge

Other/Misc.

Capture d’écran 2024-04-02 à 06 56 42

Code of Conduct

jarmak-nv commented 3 months ago

Hi @nuxwin!

Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can! In the mean time, feel free to add any relevant information to this issue.

mdemoret-nv commented 3 months ago

@nuxwin Does this happen without the Monitor stage?

nuxwin commented 3 months ago

@mdemoret-nv Of Course, yes.

nuxwin commented 3 months ago

@mdemoret-nv This doesn't seem directly related to the kafka source stage anyway. I'm wondering if this is not due to the asyncio loop. I get the same problem with the below pipeline. For us, this look like a big problem for a production use.

#!/opt/conda/envs/morpheus/bin/python

import logging
import click
import pandas as pd
import time

from morpheus.config import Config, CppConfig, PipelineModes
from morpheus.messages.message_meta import MessageMeta
from morpheus.pipeline.linear_pipeline import LinearPipeline
from morpheus.pipeline.stage_decorator import source, stage
from morpheus.utils.logger import configure_logging

logger = logging.getLogger("morpheus.{__name__}")

@source
def source_generator() -> Generator[MessageMeta, None, None]:
    while True:
        time.sleep(5)
        yield MessageMeta(df=pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}))

@stage
def simple_stage(message: MessageMeta) -> MessageMeta:
    logger.debug(f"simple_stage:\n\n{message.df.to_string()}")
    return message

@click.command()
@click.option(
    "--num_threads",
    default=1,
    type=click.IntRange(min=1),
    help="Number of internal pipeline threads to use.",
)
@click.option(
    "--pipeline_batch_size",
    default=1,
    type=click.IntRange(min=1),
    help="Internal batch size for the pipeline. Can be much larger than the model batch size",
)
def run_pipeline(num_threads, pipeline_batch_size):
    configure_logging(log_level=logging.DEBUG)

    CppConfig.set_should_use_cpp(False)

    config = Config()
    config.mode = PipelineModes.OTHER
    config.num_threads = num_threads
    config.pipeline_batch_size = pipeline_batch_size

    pipeline = LinearPipeline(config)

    pipeline.set_source(source_generator(config))
    pipeline.add_stage(simple_stage(config))

    pipeline.run()

if __name__ == "__main__":
    run_pipeline()
nuxwin commented 3 months ago

Any new ?

mdemoret-nv commented 3 months ago

I'm wondering if this is not due to the asyncio loop

I'm suspecting the same thing. Its possible the asyncio loop is just polling for changes and seeing nothing scheduled so it just repeats the process until there is some work.

For us, this look like a big problem for a production use.

Can you elaborate more on why this would be a big problem for production use for you? If the high CPU usage is due to the asyncio loop, it likely is not impacting performance of the pipeline. The loop is only spinning because there is no other work to do. Once messages are in the pipeline, they will occupy the CPU instead of the asyncio loop.

nuxwin commented 3 months ago

@mdemoret-nv

The problem is the hight CPU (core) usage >=100% at full time. The machine's fans speeding up because of this. I wonder if this could reduce the lifespan of the CPU. I don't think that having a CPU core at 100% is something normal, especially when there is no other processing than a polling. There should be a sleep or sth like this between each polling, assuming that the problem come from the loop. Often, the high CPU usage are encoutered in while(true) loops when there is no sleep, especially when no treatment is made.

Hope you get my English .

mdemoret-nv commented 3 months ago

Yes I understand what you are saying. I agree that the pipeline should not be utilizing 100% of the CPU if there is no work to be processed. We will need to look into why the asyncio loop is consistently spinning. A simple solution could be to schedule a small sleep in the loop when there is no more work.

I was wondering if there was anything specific to your deployment where 100% CPU utilization would cause problems beyond the added energy use and wear and tear. For example, some environments utilize the CPU utilization to scale their system. If the CPU was always at 100%, then it would scale infinitely which would be a problem. And the solution I suggested above may not work in that environment.

efajardo-nv commented 3 months ago

@nuxwin Also note that top by default shows the sum of utilization across all CPU cores so if you have 12 cores, the maximum utilization would be 1200%. You can check the number of cores using the command nproc --all. You can also do Shift+i while in top to see average utilization per core.

nuxwin commented 3 months ago

I was wondering if there was anything specific to your deployment where 100% CPU utilization would cause problems beyond the added energy use and wear and tear. For example, some environments utilize the CPU utilization to scale their system. If the CPU was always at 100%, then it would scale infinitely which would be a problem. And the solution I suggested above may not work in that environment.

We are developing for financial entities, among other. Our clients make use of ESXi VMs (using NVIDIA vGPUs). They won't accept such CPU usage on an "idle" pipeline.

Thank you for your time. That's much appreciated.

nuxwin commented 3 months ago

@nuxwin Also note that top by default shows the sum of utilization across all CPU cores so if you have 12 cores, the maximum utilization would be 1200%. You can check the number of cores using the command nproc --all. You can also do Shift+i while in top to see average utilization per core.

I'm talking about a CPU core which is 100% used, not about the CPU usage average ;) So yeah, of course, for a machine with 10 cores, average usage would be reduced to 10%. But the problem remain : there is a core that is 100% used, all the time.