Closed jpata closed 5 months ago
adding @raj2022
Also related: https://github.com/jpata/particleflow/issues/315
Basically, to summarize:
I'm closing this issue, and putting it on the roadmap to study ONNX post-training static quantization separately. Many thanks to @raj2022 for your contributions!
Goal: reduce inference time of the model using quantization
We made some CPU inference performance results public for 2021 in CMS, https://cds.cern.ch/record/2792320/files/DP2021_030.pdf slide 16, “For context, on a single CPU thread (Intel i7-10700 @ 2.9GHz), the baseline PF requires approximately (9 ± 5) ms, the MLPF model approximately 320 ± 50 ms for Run 3 ttbar MC events”.
Now it's a good time to make the inference as fast as possible, while minimizing any physics impact.
Resources: