Closed crazydemo closed 1 month ago
This PR brings slight performance gain.
How much perf gain from this patch? Does this perf gain also comes from "desc" to "*desc"? There could be some bubbles in the new vector as each individual thread might have less "kernel_idx" than global cache?
The perf gain partially comes from desc
-> *desc
(brings 1% perf gain), and more significantly from using std::vector
(brings another 5% perf gain).
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40">
This PR uses std::vector for thread local cache.