[Model Request] OpenChat 3.5

⚙️ Request New Models

Link to an existing implementation (e.g. Hugging Face/Github): https://huggingface.co/openchat/openchat-3.5-0106
Is this model architecture supported by MLC-LLM? (the list of supported models): Yes (Mistral)

Additional context

OpenChat 3.5 models are the best-performing 7B chat models on Chatbot Arena and HumanEval+. They share the same architecture as Mistral, but use a different conversation template as follows (no system message needed):

GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:

Starling-LM-7B-alpha, OpenChat-3.5-0106 share the same conversation template and architecture.

Actually it works well even now with the current new flow. It just needs in few adjustments. Because it's just tuned mistral architecture, and it's supported by mlc.

Here is my shell script for Intel Mac for convertation:

#!/bin/bash

name="openchat-3.5-0106"

model_dir="./models/${name}/" # your path to the model, this is just example
quantization="q4f16_1"
conv_template="gpt2"
output_dir="models/converted/${name}-${quantization}-MLC/"
device_metal="metal"
device_metal_x86_64="metal:x86-64"

output_lib_metal="${output_dir}/${name}-${quantization}-metal.so"
output_lib_metal_x86_64="${output_dir}/${name}-${quantization}-metal_x86_64.dylib"

# Convert weights
mlc_chat convert_weight ${model_dir} \
    --quantization ${quantization} \
    -o ${output_dir}

# Generate config
mlc_chat gen_config ${model_dir} \
    --quantization ${quantization} \
    --conv-template ${conv_template} \
    -o ${output_dir}

# Compile for m1 (or any other device)
mlc_chat compile ${output_dir}mlc-chat-config.json \
    --device ${device_metal} \
    -o ${output_lib_metal}

# Compile for intel mac
mlc_chat compile ${output_dir}mlc-chat-config.json \
    --device ${device_metal_x86_64} \
    -o ${output_lib_metal_x86_64}

Then you need to adjust the conversation config.

## Openchat config
conv_config = ConvConfig(
    stop_str="<|end_of_turn|>",
    stop_tokens=[32000],
    separator_style=0,
    seps= [
      "<|end_of_turn|>"
    ],
    system="Your system prompt",
)

chat_config = ChatConfig(conv_config=conv_config,)

cm = ChatModule(chat_config=chat_config, ...)

P.S.

So I believe we just need to update https://github.com/mlc-ai/mlc-llm/blob/main/cpp/conv_templates.cc file to support one more config only.

mlc-ai / mlc-llm

[Model Request] OpenChat 3.5 #1776

⚙️ Request New Models

Additional context