neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

DeepSparse uses 100% of the CPU. #1193

Closed ik-ids closed 1 year ago

ik-ids commented 1 year ago

Describe the bug

DeepSparse uses 100% of the CPU. Is this expected behaviour?

Screenshot 2023-08-16 at 16 22 29

How safe is this from the infrastructure point of view? Can we cap it to, say, 90%?

We got the below results.

  1. what is the imp of batch size?
  2. items_per_sec is fps?
  3. where can I find what each of the below terms means? I could find it in the docs.
"benchmark_result": {
    "scenario": "multistream",
    "items_per_sec": 7.4622653287999965,
    "seconds_ran": 60.03538875399954,
    "iterations": 448,
    "median": 269.41132550018665,
    "mean": 267.9830554084727,
    "std": 4.919969819928082,
    "25.0%": 265.3186567499688,
    "50.0%": 269.41132550018665,
    "75.0%": 271.01776549966416,
    "90.0%": 272.0865174998835,
    "95.0%": 273.0992654495367,
    "99.0%": 277.5078529293751,
    "99.9%": 287.5190300730301
  }

Expected behavior

Environment Linode The test machine is 2 CPU Cores, 4 GB RAM, Ubuntu 20.04 LTS, AMD EPYC 7713 64-Core Processor.

To Reproduce Exact steps to reproduce the behavior:

Running benchmark:

deepsparse.benchmark zoo:cv/detection/yolov5-s/pytorch/ultralytics/coco/pruned_quant-aggressive_94 --scenario async -nstreams 2 --time 60 -ncores 2 -x benchmark.json

Errors

Additional context P.S: search in the documentation would definitely save a lot of time.

mgoin commented 1 year ago

Hi @ik-ids it is expected behavior to use 100% of the CPU provided to DeepSparse. You can control this by setting the num_cores parameters to restrict how many CPU cores the engine will use, and for the entire process using the numactl command i.e. numactl -C0-1 python script.py to run on just the first two cores.

To answer your other questions:

  1. The impact of batch size is to increase throughput (items per second or FPS) at the expense of latency. The same is true for increasing the number of concurrent streams the engine can serve.
  2. Items per sec is FPS. It is just a measurement to generalize beyond CV tasks.
  3. Generally users haven't been interested in the benchmark_result dictionary so it isn't documented - I'll make a task to do this. median is the median latency, mean is the average latency, std is the standard deviation between latencies, and the percentages are the percentile latencies such that 50.0% == median and 99.9% is the longest latency in 99.9% of all inferences
ik-ids commented 1 year ago

Hi @mgoin, thanks for the detailed explanation. Much appreciated.