mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
MIT License
2.55k stars 207 forks source link

RuntimeError: CUDA error: no kernel image is available for execution on the device #238

Open new-Sunset-shimmer opened 1 week ago

new-Sunset-shimmer commented 1 week ago
cos = cos[position_ids].unsqueeze(unsqueeze_dim)

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions