neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[V2 Pipeline] Fine-grained timer from inference state #1469

Closed horheynm closed 8 months ago

horheynm commented 8 months ago

Description

Adds ability to record run times in Pipeline and Ops, given inference state

Problem

Currently, we dont have a way to get the run time info in the pipeline and in the operator

Solution

We add timer manager as an attribute to the Pipeline, and add Timer logic into the InferenceState as an attribute (see tests)

Design

We re use the timer attribute in the inference state. All run-times will be saved there and will be written to the timer manager once the Pipeline run ends.

Usage

Given access to inference_state

with inference_state.time(id="foo"):
   time.sleep(1)

Testing

  1. Checks that the runtime from ops and Pipeline is saved into the timer manager
  2. multi-threaded check The runtime without saving the measurements and the runtime saving the measurements are checked to be same
  3. The expected time that is saved is correct
    no timer, no wait : 2.95375657081604
    no timer, 0.2 wait: 146.53509402275085
    pipeline timer, no wait : 3.020995855331421, 2.8153765201568604 
    pipeline timer, 0.2 wait: 150.66221618652344
    pipeline + ops timer   ,    no wait : 2.9319729804992676 3.0496866703033447 3.1929404735565186
    pipeline + ops timer,       0.2 wait: 154.01117300987244
    
    from deepsparse import Pipeline
    model_path = "hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds"

pipeline = Pipeline.create( task="text_generation", model_path=model_path, engine_type="deepsparse", # onnxruntime internal_kv_cache=True, # False # ) import time start =time.time() output = pipeline(

prompt= ["The sun shined"],

prompt= ["The sun shined", "Oh hello!"],
generation_kwargs= {
    "num_return_sequences": 4,
    "do_sample": True,
    "max_length": 20,
},

) rt = time.time() - start print(rt) breakpoint()

bfineran commented 8 months ago

LGTM pending @rahul-tuli's comments