DLRM Quantization - Githubissues

intel / ai-reference-models

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs

Apache License 2.0

683 stars 220 forks source link

DLRM Quantization #158

Open Vasanta1 opened 11 months ago

Vasanta1 commented 11 months ago

I tried running IntelAI DLRM model with int8 precision with default int8_configure.json. Could someone clarify if quantization happens each time the inference_performance.sh script is triggered, or if the int8 weights are stored after the first run and reused for the later runs. Currently, the run takes around 10 hours to complete on a 64 core machine. Please let me know if any additional info is required from my end. int8_dlrm

sramakintel commented 7 months ago

@Vasanta1 per the line https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/inference_performance.sh#L67 the quantization happens each time the script is invoked. I want to make sure you are using our latest validated centos container for DLRM inference. Does it take the same amount of time with this container? https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/DEVCATALOG.md#pull-command