mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation
https://llm.mlc.ai/
Apache License 2.0
19.21k stars 1.58k forks source link

[Question] how to convert an already quanted model? #1937

Closed leiwen83 closed 6 months ago

leiwen83 commented 8 months ago

Could already quanted model like: https://huggingface.co/01-ai/Yi-34B-Chat-4bits could be directly compiled in mlc_llm?

I try directly do --quant option like q0f16 or q4f16, but it report some layer is missing like:

ValueError: The following extern parameters do not exist in the weight files:
  model.layers.0.mlp.down_proj.weight
  model.layers.0.mlp.gate_proj.weight
  model.layers.0.mlp.up_proj.weight
  model.layers.0.self_attn.k_proj.weight

Meanwhile I think maybe mlc_llm could provide a simple serving command which auto do the convert-weight staff? For other project like vllm, it could directly launch serving from already downloaded hugginface models, which is very convenient for the beginners.

tqchen commented 8 months ago

in this case, we need to build a weight mapper that maps the quantized weight into the mlc recognized format by adding a quantization convert scheme here https://github.com/mlc-ai/mlc-llm/tree/main/python/mlc_llm/quantization