support tensor parallelism

intel / auto-round

Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"

https://arxiv.org/abs/2309.05516

Apache License 2.0

245 stars 20 forks source link

support tensor parallelism #227

Closed wenhuach21 closed 1 month ago

wenhuach21 commented 2 months ago

for calibration with lm-head quantization or in tuning

wenhuach21 commented 1 month ago

issue has been alleviated, so mark it as not planned