Summary:
For each metric we iterate through all the tasks and run update in the baseline. With Torchrec we can run a fused update across all tasks as long as the inputs to the metrics are in the correct format. After this optimization we see that only a single update is called for each metric.
The CPU wall time for metrics update goes from 95ms to 7ms and the GPU wall timereduces from 2.7ms to 0.6ms.
Summary: For each metric we iterate through all the tasks and run update in the baseline. With Torchrec we can run a fused update across all tasks as long as the inputs to the metrics are in the correct format. After this optimization we see that only a single update is called for each metric.
The CPU wall time for metrics update goes from 95ms to 7ms and the GPU wall timereduces from 2.7ms to 0.6ms.
See this doc for more details: https://docs.google.com/document/d/15ELwQ1mehjecYoJJxryWDXURBJMHiTWW8iWxK-I3Y-Q/edit
Reviewed By: iamzainhuda
Differential Revision: D64205895