Open unrahul opened 4 months ago
Do you have an example huggingface hub link that we can test?
Here you go @jason-dai : https://huggingface.co/unrahul/phi-2-fp4 , i have many models in all quantization formats in https://huggingface/unrahul . All done using ipex-llm .
I use ipex-llm to quantize and push models to hub. But it seems
load_low_bit
expects the model to be locally available and cant take it from huggingface hub.It would be awesome to allow the model to be loaded from hub as well, so end user doesnt have to quantize it, making shipping the right model for the right platform much easier.
Path: https://github.com/intel-analytics/ipex-llm/blob/70b17c87be259e2a42481a100b06062efff24bf6/python/llm/src/ipex_llm/optimize.py#L137