Describe the bug
When I call mlx_lm.generate via the console, the dafault value for temp is 0.6. If I call the generate method via Python, and don't pass temp, the value 0 is used.
Expected behavior
Same Default-Values are used for cli- and python-Api when no values are provided.
Desktop (please complete the following information):
OS Version: MacOS 14.16.1
Version 0.19.0
Additional context
There is a generation_config.json in Huggingface. It would be nice if the values for temp and top_p are taken from the respective generation_config.json if no values are specified and a corresponding json is available. And when a model is fused, the corresponding generation_config.json is located in lora_fused_model.
Describe the bug When I call mlx_lm.generate via the console, the dafault value for temp is 0.6. If I call the generate method via Python, and don't pass temp, the value 0 is used.
To Reproduce
Include code snippet
Expected behavior Same Default-Values are used for cli- and python-Api when no values are provided.
Desktop (please complete the following information):
Additional context There is a generation_config.json in Huggingface. It would be nice if the values for temp and top_p are taken from the respective generation_config.json if no values are specified and a corresponding json is available. And when a model is fused, the corresponding generation_config.json is located in lora_fused_model.
https://huggingface.co/meta-llama/Llama-3.1-8B/blob/main/generation_config.json