RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.
Until now, raft has stored a map of NVTX colors (annotation -> color) to avoid using the same color for different annotations and keep using the same color for the same annotations. This map is a shared state.
During an extensive ANN_BENCH throughput testing it has turned out that the mutex guarding the map can sometimes become a bottleneck when the number of concurrent threads is really large (>~ 256). This PR replaces the unordered map and the mutex guarding it with a deterministic hash value of the annotation instead (which is stateless).
Pros:
No shared state, no mutexes.
Assigns the same colors to the same annotations across program runs.
Cons:
Sometimes different annotations can have the same color (hash collisions).
Until now, raft has stored a map of NVTX colors (annotation -> color) to avoid using the same color for different annotations and keep using the same color for the same annotations. This map is a shared state. During an extensive ANN_BENCH throughput testing it has turned out that the mutex guarding the map can sometimes become a bottleneck when the number of concurrent threads is really large (>~ 256). This PR replaces the unordered map and the mutex guarding it with a deterministic hash value of the annotation instead (which is stateless).
Pros:
Cons: