Closed eek closed 7 months ago
Seems that if I disable verbose=True
it no longer errors, but then I have the issue with nothing being generated.
The last working script I have is:
from mlx_lm import load, generate
model, tokenizer = load(
"mlx-community/dbrx-instruct-4bit",
tokenizer_config={"trust_remote_code": True}
)
chat = [
{"role": "user", "content": "What's the difference between PCA vs UMAP vs t-SNE?"},
{"role": "assistant", "content": "The "},
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
# We need to remove the last <|im_end|> token so that the AI continues generation
prompt = prompt[::-1].replace("<|im_end|>"[::-1], "", 1)[::-1]
response = generate(model, tokenizer, prompt=prompt, verbose=True, temp=0.6, max_tokens=1500)
Maybe not related, but you can pass --use-default-chat-template
to mlx_lm.generate to enable the default chat template for the model if it is using the default chat template. e.g.
python -m mlx_lm.generate --model dbrx-instruct-4bit --prompt "$(cat my_prompt)" --trust-remote-code --use-default-chat-template --max-tokens 1000
I've learned --use-default-chat-template
the hard way😅
Seems if I use --use-default-chat-template
it indeed works.
So, the only issue is via the python script, the following code errors:
from mlx_lm import load, generate
model, tokenizer = load(
"mlx-community/dbrx-instruct-4bit",
tokenizer_config={"trust_remote_code": True}
)
chat = [
{"role": "user", "content": "What's the difference between PCA vs UMAP vs t-SNE?"},
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
==========
Traceback (most recent call last):
File "/Users/eek/work/dbrx/test2.py", line 14, in <module>
response = generate(model, tokenizer, prompt=prompt, verbose=True)
File "/Users/eek/.pyenv/versions/3.10.12/lib/python3.10/site-packages/mlx_lm-0.4.0-py3.10.egg/mlx_lm/utils.py", line 273, in generate
prompt_tps = prompt_tokens.size / prompt_time
UnboundLocalError: local variable 'prompt_time' referenced before assignment
Seems like you aren't getting any output using that prompt 🤔 (which triggering an edge case and causing the crash).
It works if you do the following when you apply the template:
prompt = tokenizer.apply_chat_template(chat, add_generation_promprt=True, tokenize=False)
Hi there!
Wanted to say congrats @awni for the work on the DBRX support.
I've also converted and uploaded the
dbrx-instruct
version on HF: https://huggingface.co/mlx-community/dbrx-instruct-4bitIt works ok with no prompt templating but for Instruct it works way better with prompt templating, for which I have a small issue:
If I just do the following:
and do not add the assistant part, it errors with:
if I do
it works but then I get an instant
<|im_end|>
and end of execution.The best result I've had so far was:
Where I've also added a couple words after the assistant start, this works well.
This is my bash command:
where the above prompt is added in the
my_prompt
file.here's the equivalent python script:
The mlx-lm that I have locally is the latest commit one b80adbc.