microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Apache License 2.0
1.85k stars 174 forks source link

error when using Qwen1.5-32B #468

Open puppet101 opened 5 months ago

puppet101 commented 5 months ago

Hi, I meet some errors when serving Qwen1.5-32B using fp6 quantization, could you please give me a help? Thank you. My code is below: import mii pipe = mii.pipeline('/mymodel/Qwen1.5-32B-fp16', quantization_mode='wf6af16', tensor_parallel=8) response = pipe(["DeepSpeed is", "Seattle is"], max_new_tokens=128) print(response)

And the error is:

Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... python: /root/miniconda3/envs/pytorch22/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/core_ops/cuda_linear/include/weight_prepacking.h:151: void weight_matrix_prepacking(int*, size_t, size_t): Assertion `K % 64 == 0' failed.

celiolarcher commented 2 days ago

Any news on this? I'm facing the same error.