Closed wenhuach21 closed 1 week ago
Cuda kernel only supports FP16, while the max value of some layers of Qwen is very large
workaround: try different configs and use_clip in autoround kernel
fall back these layers
Cuda kernel only supports FP16, while the max value of some layers of Qwen is very large