ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.3k stars 898 forks source link

[BUG] Different Default-Values for Temperature #1017

Closed hschaeufler closed 1 month ago

hschaeufler commented 1 month ago

Describe the bug When I call mlx_lm.generate via the console, the dafault value for temp is 0.6. If I call the generate method via Python, and don't pass temp, the value 0 is used.

To Reproduce

Include code snippet

 mlx_lm.generate --model "results/llama3_1_8B_instruct_lora/tuning_03/lora_fused_model/" \
    --max-tokens 4000 \
    --prompt "Say hello to me"
> {'temp': 0.6, 'top_p': 1.0, 'max_kv_size': None, 'cache_history': None}
model_path = "results/llama3_1_8B_instruct_lora/tuning_03/lora_fused_model"
model, tokenizer = load(model_path)

prompt =  """Say hello to me"""
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=prompt, verbose=False, max_tokens=128000)

> temp 0.0

Expected behavior Same Default-Values are used for cli- and python-Api when no values are provided.

Desktop (please complete the following information):

Additional context There is a generation_config.json in Huggingface. It would be nice if the values for temp and top_p are taken from the respective generation_config.json if no values are specified and a corresponding json is available. And when a model is fused, the corresponding generation_config.json is located in lora_fused_model.

https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/generation_config.json

awni commented 1 month ago

Yea the inconsistency is a bit odd. I changed the default to 0.0 for both cases in the latest MLX LM.