Closed leiwen83 closed 6 months ago
in this case, we need to build a weight mapper that maps the quantized weight into the mlc recognized format by adding a quantization convert scheme here https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_llm/quantization
Could already quanted model like: https://huggingface.co/01-ai/Yi-34B-Chat-4bits could be directly compiled in mlc_llm?
I try directly do --quant option like q0f16 or q4f16, but it report some layer is missing like:
Meanwhile I think maybe mlc_llm could provide a simple serving command which auto do the convert-weight staff? For other project like vllm, it could directly launch serving from already downloaded hugginface models, which is very convenient for the beginners.