Open Vasanta1 opened 11 months ago
@Vasanta1 per the line https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/inference_performance.sh#L67 the quantization happens each time the script is invoked. I want to make sure you are using our latest validated centos container for DLRM inference. Does it take the same amount of time with this container? https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/DEVCATALOG.md#pull-command
I tried running IntelAI DLRM model with int8 precision with default int8_configure.json. Could someone clarify if quantization happens each time the inference_performance.sh script is triggered, or if the int8 weights are stored after the first run and reused for the later runs. Currently, the run takes around 10 hours to complete on a 64 core machine. Please let me know if any additional info is required from my end.