Open imoneoi opened 9 months ago
Actually it works well even now with the current new flow. It just needs in few adjustments. Because it's just tuned mistral architecture, and it's supported by mlc
.
Here is my shell script for Intel Mac for convertation:
#!/bin/bash
name="openchat-3.5-0106"
model_dir="./models/${name}/" # your path to the model, this is just example
quantization="q4f16_1"
conv_template="gpt2"
output_dir="models/converted/${name}-${quantization}-MLC/"
device_metal="metal"
device_metal_x86_64="metal:x86-64"
output_lib_metal="${output_dir}/${name}-${quantization}-metal.so"
output_lib_metal_x86_64="${output_dir}/${name}-${quantization}-metal_x86_64.dylib"
# Convert weights
mlc_chat convert_weight ${model_dir} \
--quantization ${quantization} \
-o ${output_dir}
# Generate config
mlc_chat gen_config ${model_dir} \
--quantization ${quantization} \
--conv-template ${conv_template} \
-o ${output_dir}
# Compile for m1 (or any other device)
mlc_chat compile ${output_dir}mlc-chat-config.json \
--device ${device_metal} \
-o ${output_lib_metal}
# Compile for intel mac
mlc_chat compile ${output_dir}mlc-chat-config.json \
--device ${device_metal_x86_64} \
-o ${output_lib_metal_x86_64}
Then you need to adjust the conversation config.
## Openchat config
conv_config = ConvConfig(
stop_str="<|end_of_turn|>",
stop_tokens=[32000],
separator_style=0,
seps= [
"<|end_of_turn|>"
],
system="Your system prompt",
)
chat_config = ChatConfig(conv_config=conv_config,)
cm = ChatModule(chat_config=chat_config, ...)
P.S.
So I believe we just need to update https://github.com/mlc-ai/mlc-llm/blob/main/cpp/conv_templates.cc file to support one more config only.
⚙️ Request New Models
Additional context
OpenChat 3.5 models are the best-performing 7B chat models on Chatbot Arena and HumanEval+. They share the same architecture as Mistral, but use a different conversation template as follows (no system message needed):
Starling-LM-7B-alpha, OpenChat-3.5-0106 share the same conversation template and architecture.