pytorch / torchrec

Pytorch domain library for recommendation systems
BSD 3-Clause "New" or "Revised" License
1.79k stars 381 forks source link

Add GPU trace for KT.regroup benchmark #2157

Closed TroyGarden closed 1 week ago

TroyGarden commented 1 week ago

Summary:

context

usage

performance

INFO:2024-06-21 22:22:51 1102779:1102779 CuptiCallbackApi.cpp:78] Callback: domain = 3, cbid = 1
INFO:2024-06-21 22:22:51 1102779:1102779 CuptiActivityProfiler.cpp:241] CUDA versions. CUPTI: 18; Runtime: 12000; Driver: 12000
INFO:2024-06-21 22:22:51 1102779:1102779 NcclProfiler.cpp:150] NCCL Profiler Instantiated
  _regroup_keyed_tenors               | B: 1024     | F: 1020     | device: cuda     | Runtime (P90):   2.8 ms | Memory (P90): 1011.0
  KeyedTensor.regroup                 | B: 1024     | F: 1020     | device: cuda     | Runtime (P90):   5.0 ms | Memory (P90): 1517.0
  KTRegroupAsDict                     | B: 1024     | F: 1020     | device: cuda     | Runtime (P90):   4.9 ms | Memory (P90): 1517.0
  _regroup_keyed_tenors_dup           | B: 1024     | F: 1020     | device: cuda     | Runtime (P90):   2.5 ms | Memory (P90): 1011.0
  KeyedTensor.regroup_dup             | B: 1024     | F: 1020     | device: cuda     | Runtime (P90):   2.5 ms | Memory (P90): 1011.0
  KTRegroupAsDict_dup                 | B: 1024     | F: 1020     | device: cuda     | Runtime (P90):   2.5 ms | Memory (P90): 1011.0

traces

Differential Revision: D58906521

facebook-github-bot commented 1 week ago

This pull request was exported from Phabricator. Differential Revision: D58906521

facebook-github-bot commented 1 week ago

This pull request was exported from Phabricator. Differential Revision: D58906521