DLRM Quantization - Githubissues

intel / models

Intel® AI Reference Models: contains Intel optimizations for running deep learning workloads on Intel® Xeon® Scalable processors and Intel® Data Center GPUs

Apache License 2.0

667 stars 213 forks source link

DLRM Quantization #158

Open Vasanta1 opened 7 months ago

Vasanta1 commented 7 months ago

I tried running IntelAI DLRM model with int8 precision with default int8_configure.json. Could someone clarify if quantization happens each time the inference_performance.sh script is triggered, or if the int8 weights are stored after the first run and reused for the later runs. Currently, the run takes around 10 hours to complete on a 64 core machine. Please let me know if any additional info is required from my end. int8_dlrm

sramakintel commented 3 months ago

@Vasanta1 per the line https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/inference_performance.sh#L67 the quantization happens each time the script is invoked. I want to make sure you are using our latest validated centos container for DLRM inference. Does it take the same amount of time with this container? https://github.com/IntelAI/models/blob/r3.1/quickstart/recommendation/pytorch/dlrm/inference/cpu/DEVCATALOG.md#pull-command