Open HanatoK opened 2 weeks ago
AOTIModelContainerRunnerCuda
crashes (with tensors on the CUDA device) but AOTIModelContainerRunnerCpu
does not.
@angelayi , can you help to take a look? The backtrace points to aoti_torch_proxy_executor_call_function
.
🐛 Describe the bug
torch.linalg.eigh
crashes if the model is compiled into an AOTInductor model and used from the C++ side. The example python code is attached as follows:The C++ code to load the model is
GDB backtrace:
Versions
Collecting environment information... PyTorch version: 2.6.0a0+gitc6609ec Is debug build: False CUDA used to build PyTorch: 12.5 ROCM used to build PyTorch: N/A
OS: openSUSE Tumbleweed (x86_64) GCC version: (SUSE Linux) 14.2.1 20241007 [revision 4af44f2cf7d281f3e4f3957efce10e8b2ccb2ad3] Clang version: 18.1.8 CMake version: version 3.30.4 Libc version: glibc-2.40
Python version: 3.11.10 (main, Sep 09 2024, 17:03:08) [GCC] (64-bit runtime) Python platform: Linux-6.11.3-1-default-x86_64-with-glibc2.40 Is CUDA available: True CUDA runtime version: 12.5.82 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU Nvidia driver version: 550.120 cuDNN version: Probably one of the following: /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_adv.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_cnn.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_engines_precompiled.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_engines_runtime_compiled.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_graph.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_heuristic.so.9.3.0 /usr/local/cuda-12.5/targets/x86_64-linux/lib/libcudnn_ops.so.9.3.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD Ryzen 7 5800H with Radeon Graphics CPU family: 25 Model: 80 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 1 Stepping: 0 Frequency boost: enabled CPU(s) scaling MHz: 67% CPU max MHz: 4463.0000 CPU min MHz: 400.0000 BogoMIPS: 6390.91 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap Virtualization: AMD-V L1d cache: 256 KiB (8 instances) L1i cache: 256 KiB (8 instances) L2 cache: 4 MiB (8 instances) L3 cache: 16 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Mitigation; Safe RET Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected
Versions of relevant libraries: [pip3] flake8==7.1.1 [pip3] mypy_extensions==1.0.0 [pip3] numpy==2.1.1 [pip3] numpydoc==1.7.0 [pip3] torch==2.6.0a0+gitc6609ec [pip3] triton==3.1.0 [conda] No relevant packages
cc @ezyang @chauhang @penguinwu @avikchaudhuri @gmagogsfm @zhxchen17 @tugsbayasgalan @angelayi @suo @ydwu4 @desertfire @chenyang78