import torch
from torch.profiler import ProfilerActivity, profile, record_function, tensorboard_trace_handler
DEVICE = "cuda:1"
def main():
t = torch.rand(10, 10).to(DEVICE)
for _ in range(100):
t = t @ t
trace_handler = tensorboard_trace_handler("pytorch_traces", use_gzip=True)
profiler = profile(
activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA],
profile_memory=True,
with_stack=True,
on_trace_ready=trace_handler,
)
# profile the main function
profiler.start()
main()
profiler.stop()
fails with:
Traceback (most recent call last):
File "/import/bc_workspaces/biocomp/tboyer/sources/GaussianProxy/error_repro.py", line 25, in <module>
profiler.stop()
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 722, in stop
self._transit_action(self.current_action, None)
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 751, in _transit_action
action()
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 745, in _trace_ready
self.on_trace_ready(self)
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 444, in handler_fn
prof.export_chrome_trace(os.path.join(dir_name, file_name))
File "/import/bc_workspaces/biocomp/tboyer/.micromamba/stat2dyn/lib/python3.12/site-packages/torch/profiler/profiler.py", line 220, in export_chrome_trace
fout.writelines(fin)
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9d in position 5237: invalid start byte
with varying bytes and positions ((0xf8, 5248), etc), and either start or continuation byte.
Versions
Environment information
```
PyTorch version: 2.4.0
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0] (64-bit runtime)
Python platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA L40S
GPU 1: NVIDIA L40S
GPU 2: NVIDIA L40S
GPU 3: NVIDIA L40S
Nvidia driver version: 550.54.14
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture : x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme : Little Endian
Address sizes: 52 bits physical, 57 bits virtual
Processeur(s) : 64
Liste de processeur(s) en ligne : 0-63
Thread(s) par cœur : 1
Cœur(s) par socket : 1
Socket(s) : 64
Nœud(s) NUMA : 1
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle : 143
Nom de modèle : Intel(R) Xeon(R) Gold 6426Y
Révision : 8
Vitesse du processeur en MHz : 2500.000
BogoMIPS : 5000.00
Virtualisation : VT-x
Constructeur d'hyperviseur : KVM
Type de virtualisation : complet
Cache L1d : 2 MiB
Cache L1i : 2 MiB
Cache L2 : 256 MiB
Cache L3 : 1 GiB
Nœud NUMA 0 de processeur(s) : 0-63
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; TSX disabled
Drapaux : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid cldemote movdiri movdir64b fsrm md_clear arch_capabilities
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.4.0
[pip3] torchinfo==1.8.0
[pip3] torchvision==0.19.0
[pip3] triton==3.0.0
[conda] Could not collect
```
🐛 Describe the bug
The following code:
fails with:
with varying bytes and positions (
(0xf8, 5248)
, etc), and eitherstart
orcontinuation
byte.Versions
Environment information
``` PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.6 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 Clang version: Could not collect CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.12.5 | packaged by conda-forge | (main, Aug 8 2024, 18:36:51) [GCC 12.4.0] (64-bit runtime) Python platform: Linux-5.8.0-63-generic-x86_64-with-glibc2.31 Is CUDA available: True CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA L40S GPU 1: NVIDIA L40S GPU 2: NVIDIA L40S GPU 3: NVIDIA L40S Nvidia driver version: 550.54.14 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.7 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True CPU: Architecture : x86_64 Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit Boutisme : Little Endian Address sizes: 52 bits physical, 57 bits virtual Processeur(s) : 64 Liste de processeur(s) en ligne : 0-63 Thread(s) par cœur : 1 Cœur(s) par socket : 1 Socket(s) : 64 Nœud(s) NUMA : 1 Identifiant constructeur : GenuineIntel Famille de processeur : 6 Modèle : 143 Nom de modèle : Intel(R) Xeon(R) Gold 6426Y Révision : 8 Vitesse du processeur en MHz : 2500.000 BogoMIPS : 5000.00 Virtualisation : VT-x Constructeur d'hyperviseur : KVM Type de virtualisation : complet Cache L1d : 2 MiB Cache L1i : 2 MiB Cache L2 : 256 MiB Cache L3 : 1 GiB Nœud NUMA 0 de processeur(s) : 0-63 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; TSX disabled Drapaux : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx512_bf16 wbnoinvd arat avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid cldemote movdiri movdir64b fsrm md_clear arch_capabilities Versions of relevant libraries: [pip3] numpy==1.26.4 [pip3] torch==2.4.0 [pip3] torchinfo==1.8.0 [pip3] torchvision==0.19.0 [pip3] triton==3.0.0 [conda] Could not collect ```
cc @robieta @chaekit @aaronenyeshi @guotuofeng @guyang3532 @dzhulgakov @davidberard98 @briancoutinho @sraikund16 @sanrise