intel / xFasterTransformer

Apache License 2.0
374 stars 65 forks source link

Update AWQ GPTQ quantization guide #306

Closed miaojinc closed 2 months ago

miaojinc commented 6 months ago

update content related to Convert Python API, add quantization options.

pujiang2018 commented 5 months ago

My concern is that the packages in requirements.txt may trigger some issues during security checking, let's target for next version.

miaojinc commented 5 months ago

My concern is that the packages in requirements.txt may trigger some issues during security checking, let's target for next version.

Yes, it has some potential issues. The reason of the packages version is because we lacks the group quantization operators, so we have to quantize the models on CPU according to our kernel. If we can align with autoawq and autogptq, we can load the quantized weights directly with out any modification for autoawq and autogptq.

Duyi-Wang commented 2 months ago

Closed, has been merged in https://github.com/intel/xFasterTransformer/commit/59a9430d4ee2ca99de4ca4ea78b9f3eba868e900.