Closed Eutenacity closed 3 months ago
It seems that all experts tensor loaded from gguf will be dequantized into float32. It will take large CPU memory, is it right? Is it possible to save CPU memory?
"ktransformers/ktransformers/util/custom_gguf.py" "line 274 def load_gguf_tensor ..."
It seems that all experts tensor loaded from gguf will be dequantized into float32. It will take large CPU memory, is it right? Is it possible to save CPU memory?
"ktransformers/ktransformers/util/custom_gguf.py" "line 274 def load_gguf_tensor ..."