fused mlp is sometimes not working with safetensors, add an argument for it

qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ

Apache License 2.0

2.98k stars 457 forks source link

Closed DalasNoin closed 1 year ago

DalasNoin commented 1 year ago

fused mlp is sometimes not working with safetensors, no_fused_mlp is used to set fused_mlp to False, default is true