Closed SORT-1 closed 1 month ago
The error you report are within the expected relative tolerance of floating point. Let's discuss in your other issue https://github.com/pytorch/pytorch/issues/133006 if you want to. If the concern is two runs of the same function returning different results with the same output, see the note on determinism here: https://pytorch.org/docs/stable/notes/randomness.html#reproducibility
Closing this as expected behavior. We can continue discussion in https://github.com/pytorch/pytorch/issues/133006 about the precision/determinism expectations
🐛 Describe the bug
Description:
I have encountered some precision differences when using the
__add__
,layer_norm
andlinear
in the following way.Code to Reproduce:
Precision Differences:
__add__
:layer_norm
:linear
:System Info:
Expected Behavior: The calculation results of every API are consistent across CPU and GPU, with precision differences less than 1e-3.
Actual Behavior: The CPU and GPU results of the third API
linear
show differences greater than the accepted threshold 1e-3.Additional Context: 5200 pt files
Thank you for your attention to this matter. Please let me know if any further information is required.
Versions
PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.31
Python version: 3.8.0 (default, Nov 6 2019, 21:49:08) [GCC 7.3.0] (64-bit runtime) Python platform: Linux-5.15.0-100-generic-x86_64-with-glibc2.10 Is CUDA available: True CUDA runtime version: 10.1.243 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2080 Ti GPU 1: NVIDIA GeForce RTX 2080 Ti GPU 2: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 550.54.14 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 45 bits physical, 48 bits virtual CPU(s): 80 On-line CPU(s) list: 40,58,63,66 Off-line CPU(s) list: 0-39,41-57,59-62,64,65,67-79 Thread(s) per core: 0 Core(s) per socket: 40 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 106 Model name: Intel(R) Xeon(R) Gold 5320 CPU @ 2.20GHz Stepping: 6 CPU MHz: 2199.999 BogoMIPS: 4399.99 L1d cache: 3.8 MiB L1i cache: 2.5 MiB L2 cache: 100 MiB L3 cache: 78 MiB NUMA node0 CPU(s): 0-39 NUMA node1 CPU(s): 40-79 Vulnerability Gather data sampling: Vulnerable: No microcode Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid fsrm md_clear flush_l1d arch_capabilities
Versions of relevant libraries: [pip3] numpy==1.24.4 [pip3] torch==2.3.1 [pip3] torchaudio==2.3.1 [pip3] torchvision==0.18.1 [pip3] triton==2.3.1 [conda] numpy 1.24.4 pypi_0 pypi [conda] torch 2.3.1 pypi_0 pypi [conda] torchaudio 2.3.1 pypi_0 pypi [conda] torchvision 0.18.1 pypi_0 pypi [conda] triton 2.3.1 pypi_0 pypi