pytorch / kineto

A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
Other
692 stars 164 forks source link

Profiler error with tensorboard_trace_handler: UnicodeDecodeError: 'utf-8' codec can't decode byte [...]: invalid start / continuation byte #988

Open WarmongeringBeaver opened 5 days ago

WarmongeringBeaver commented 5 days ago

🐛 Describe the bug

The following code:

import torch
from torch.profiler import ProfilerActivity, profile, record_function, tensorboard_trace_handler

DEVICE = "cuda:1"

def main():
    t = torch.rand(10, 10).to(DEVICE)
    for _ in range(100):
        t = t @ t

trace_handler = tensorboard_trace_handler("pytorch_traces", use_gzip=True)
profiler = profile(
    activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
    profile_memory=True,
    with_stack=True,
    on_trace_ready=trace_handler,
)

# profile the main function
profiler.start()
main()
profiler.stop()

fails with:

Traceback (most recent call last):
  File "/import/bc_workspaces/biocomp/tboyer/sources/GaussianProxy/error_repro.py", line 25, in <module>
    profiler.stop()
  File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 722, in stop
    self._transit_action(self.current_action, None)
  File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 751, in _transit_action
    action()
  File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 745, in _trace_ready
    self.on_trace_ready(self)
  File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 444, in handler_fn
    prof.export_chrome_trace(os.path.join(dir_name, file_name))
  File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 220, in export_chrome_trace
    fout.writelines(fin)
  File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 5237: invalid start byte

with varying bytes and positions ((0xf8, 5248), etc), and either start or continuation byte.

Versions

Environment information

``` PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0] (64-bit runtime) Python platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA L40S GPU 1: NVIDIA L40S GPU 2: NVIDIA L40S GPU 3: NVIDIA L40S Nvidia driver version: 550.54.14 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture : x86_64 Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit Boutisme : Little Endian Address sizes: 52 bits physical, 57 bits virtual Processeur(s) : 64 Liste de processeur(s) en ligne : 0-63 Thread(s) par cœur : 1 Cœur(s) par socket : 1 Socket(s) : 64 Nœud(s) NUMA : 1 Identifiant constructeur : GenuineIntel Famille de processeur : 6 Modèle : 143 Nom de modèle : Intel(R) Xeon(R) Gold 6426Y Révision : 8 Vitesse du processeur en MHz : 2500.000 BogoMIPS : 5000.00 Virtualisation : VT-x Constructeur d'hyperviseur : KVM Type de virtualisation : complet Cache L1d : 2 MiB Cache L1i : 2 MiB Cache L2 : 256 MiB Cache L3 : 1 GiB Nœud NUMA 0 de processeur(s) : 0-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled Drapaux : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid cldemote movdiri movdir64b fsrm md_clear arch_capabilities Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] torch==2.4.0 [pip3] torchinfo==1.8.0 [pip3] torchvision==0.19.0 [pip3] triton==3.0.0 [conda] Could not collect ```

cc @robieta @chaekit @aaronenyeshi @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise

WarmongeringBeaver commented 2 days ago

Seems like with_stack=True is the culprit, also can't repro on a fresh colab but still reproducible on my install.

sraikund16 commented 2 days ago

All tensorboard issues should be in kineto. Transferring