Open KingICCrab opened 8 months ago
Can you be more specific please?
SUT name : PySUT Scenario : SingleStream Mode : PerformanceOnly 90th percentile latency (ns) : 700060891 Result is : INVALID Min duration satisfied : Yes Min queries satisfied : Yes Early stopping satisfied: NO Recommendations: Early Stopping Result:
QPS w/ loadgen overhead : 1.57 QPS w/o loadgen overhead : 1.57
Min latency (ns) : 597931992 Max latency (ns) : 700060891 Mean latency (ns) : 635649808 50.00 percentile latency (ns) : 632335750 90.00 percentile latency (ns) : 700060891 95.00 percentile latency (ns) : 700060891 97.00 percentile latency (ns) : 700060891 99.00 percentile latency (ns) : 700060891 99.90 percentile latency (ns) : 700060891
samples_per_query : 1 target_qps : 1 target_latency (ns): 0 max_async_queries : 1 min_duration (ms): 0 max_duration (ms): 0 min_query_count : 10 max_query_count : 10 qsl_rng_seed : 13281865557512327830 sample_index_rng_seed : 198141574272810017 schedule_rng_seed : 7575108116881280410 accuracy_log_rng_seed : 0 accuracy_log_probability : 0 accuracy_log_sampling_target : 0 print_timestamps : 0 performance_issue_unique : 0 performance_issue_same : 0 performance_issue_same_index : 0 performance_sample_count : 10833
No warnings encountered during test.
No errors encountered during test.
If the installed software stack (cuda version, onnxruntime version and cudnn version) is not supported for CUDA execution, then cuda execution provider won't work and execution happens on the CPU. It would be nice if CM can detect this and fail nicely - but this is not there at the moment. To make the code run, we can change the version of the dependencies by adding --adr.onnxruntime.version=1.16.3
to the run command or change the cuda runtime version like --adr.cuda.version=11.8
.
when I run cm run script --tags=generate-run-cmds,inference,_find-performance,_all-scenarios --model=bert-99 --implementation=reference --device=cuda --backend=onnxruntime --category=edge --division=open --quiet the error is /home/zhaohc/cm/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:69: UserWarning: Specified provider 'CUDAExecutionProvider' is not in available provider names.Available providers: 'AzureExecutionProvider, CPUExecutionProvider' warnings.warn( 2024-03-23 12:50:23.216456985 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer 'bert.pooler.dense.bias'. It is not used by any node and should be removed from the model. 2024-03-23 12:50:23.216514497 [W:onnxruntime:, graph.cc:3593 CleanUnusedInitializersAndNodeArgs] Removing initializer 'bert.pooler.dense.weight'. It is not used by any node and should be removed from the model. the result is zhaohc710-reference-gpu-onnxruntime-v1.17.1-default_config +---------+--------------+----------+-------+-----------------+---------------------------------+ | Model | Scenario | Accuracy | QPS | Latency (in ms) | Power Efficiency (in samples/J) | +---------+--------------+----------+-------+-----------------+---------------------------------+ | bert-99 | SingleStream | - | - | X 0.0 | | | bert-99 | Offline | - | 2.657 | - | | +---------+--------------+----------+-------+-----------------+---------------------------------+