Closed horheynm closed 6 months ago
Tested two entrypoints for deepsparse.benchmark. One used internal KV and other used external. Goal is to always use internal KV.
from deepsparse.benchmark.benchmark_model import benchmark_model
stub = "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80" results = benchmark_model(stub) print(results)
2. CLI ```bash deepsparse.benchmark "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80"
{ "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False", "version":"1.7.0.20240104", "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80", "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx", "batch_size":1, "input_shapes":"None", "num_cores":32, "scenario":"singlestream", "scheduler":"Scheduler.default", "seconds_to_run":10, "num_streams":1, "benchmark_result":{ "scenario":"singlestream", "items_per_sec":0.7033625366578768, "seconds_ran":15.6391610680148, "iterations":11, "median":576.397096272558, "mean":1421.7223456044767, "std":2497.9850669397615, "25.0%":493.93453216180205, "50.0%":576.397096272558, "75.0%":676.7404270358384, "90.0%":1781.627886928618, "95.0%":5504.294607555494, "99.0%":8482.427984056996, "99.9%":9152.507993769847 }, "fraction_of_supported_ops":1.0, "sequence_length":2048, "input_ids_length":1 }
{ "engine":"deepsparse.engine.Engine:\n\tonnx_file_path: /home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx\n\tbatch_size: 1\n\tnum_cores: 32\n\tnum_streams: 1\n\tscheduler: Scheduler.default\n\tfraction_of_supported_ops: 1.0\n\tcpu_avx_type: avx2\n\tcpu_vnni: False", "version":"1.7.0.20240104", "orig_model_path":"zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80", "model_path":"/home/george/.cache/sparsezoo/neuralmagic/mistral-7b-gsm8k_mistral_pretrain-pruned80/deployment/model.onnx", "batch_size":1, "input_shapes":null, "num_cores":32, "scenario":"singlestream", "scheduler":"Scheduler.default", "seconds_to_run":10, "num_streams":1, "benchmark_result":{ "scenario":"singlestream", "items_per_sec":1.1406279406751316, "seconds_ran":19.287621506955475, "iterations":22, "median":353.46097755245864, "mean":876.6876120670614, "std":2260.886819025657, "25.0%":286.9633190566674, "50.0%":353.46097755245864, "75.0%":506.17970793973655, "90.0%":652.1910438779744, "95.0%":800.4174952395259, "99.0%":9027.320535853496, "99.9%":10993.794929189637 }, "fraction_of_supported_ops":1.0, "sequence_length":2048, "input_ids_length":1 }
Description
Tested two entrypoints for deepsparse.benchmark. One used internal KV and other used external. Goal is to always use internal KV.
stub = "zoo:mistral-7b-gsm8k_mistral_pretrain-pruned80" results = benchmark_model(stub) print(results)
Results