mlcommons / inference_results_v1.1

This repository contains the results and code for the MLPerf™ Inference v1.1 benchmark.
https://mlcommons.org/en/inference-datacenter-11/
Apache License 2.0
11 stars 23 forks source link

DLRM 99.9 performance and accuracy runs getting stuck on Xeon Icelake CPU #12

Open lvaidya2910 opened 1 year ago

lvaidya2910 commented 1 year ago
1157-0  : Complete load query samples !!
1158-3  : Complete load query samples !!
1158-2  : Complete load query samples !!
1158-1  : Complete load query samples !!
1158-4  : Complete load query samples !!
1158-0  : Complete load query samples !!
1157-2  : Complete load query samples !!
1157-1  : Complete load query samples !!
1157-4  : Complete load query samples !!
1157-3  : Complete load query samples !!

The DLRM 99.9 runs are getting stuck after this output. It is stuck at this spot for more than 4-5 hours; I am not sure what is causing this issue. The runs won't produce anything in the mlperf_log_summary.txt file.

Also, I have attached the performance run mlperf_log_detail.txt file if it can help us figure out what's the issue. mlperf_log_detail.txt