[Question] how to convert an already quanted model?

Could already quanted model like: https://huggingface.co/01-ai/Yi-34B-Chat-4bits could be directly compiled in mlc_llm?

I try directly do --quant option like q0f16 or q4f16, but it report some layer is missing like:

ValueError: The following extern parameters do not exist in the weight files:
  model.layers.0.mlp.down_proj.weight
  model.layers.0.mlp.gate_proj.weight
  model.layers.0.mlp.up_proj.weight
  model.layers.0.self_attn.k_proj.weight

Meanwhile I think maybe mlc_llm could provide a simple serving command which auto do the convert-weight staff? For other project like vllm, it could directly launch serving from already downloaded hugginface models, which is very convenient for the beginners.

mlc-ai / mlc-llm

[Question] how to convert an already quanted model? #1937