Add GPTQ support.
Users have to add "disable_exllama": true to the model's quantization config file.
I tried to do this in code but BigDL keeps ignoring it.
Add whole bunch of quantization data types.
NF3 / 3-bit is the most interesting one.
Update IPEX libs.
Attention optimizations.
Lock up fixes at very big attention queries.
Users have to add
"disable_exllama": true
to the model's quantization config file.I tried to do this in code but BigDL keeps ignoring it.
NF3 / 3-bit is the most interesting one.
Attention optimizations.
Lock up fixes at very big attention queries.