pytorch / torchrec

Pytorch domain library for recommendation systems
https://pytorch.org/torchrec/
BSD 3-Clause "New" or "Revised" License
1.95k stars 441 forks source link

Add attempt QPS metric #2524

Closed iamzainhuda closed 4 weeks ago

iamzainhuda commented 1 month ago

Summary: Add an attempt QPS metric to measure throughput (QPS) performance of a job attempt.

This is relevant when different job attempts can be scheduled on different hardware types. When that happens, the lifetime qps metric ends up being an average across different hardware types that can have very different capabilities and is no longer useful for performance analysis. Having a metric that calculates QPS at an attempt level allows for meaningful performance analysis even across different hardware types.

Differential Revision: D64878139

facebook-github-bot commented 1 month ago

This pull request was exported from Phabricator. Differential Revision: D64878139

facebook-github-bot commented 4 weeks ago

This pull request was exported from Phabricator. Differential Revision: D64878139