Closed jamesm131 closed 2 months ago
Okay I've made these updates. The linked manage.py file uses the mlx pattern by default to search for models so I've done that here. I tried doing something like this to check the model_type
in the config, but this took a long time to run, and if a model is only partially downloaded it would trigger a full download.
model_configs = [
zip(repo.repo_id, load_config(get_model_path(repo.repo_id))["model_type"]) for repo in hf_cache_info.repos
]
folder_path = Path("mlx_lm.models")
model_list = [f.stem for f in folder_path.glob('*.py')]
downloaded_models = [model for model, model_type in model_configs if model_type in model_list]
I couldn't see another way to easily check the model_type
, so just went with the 'mlx' check. Happy to modify if there is a better way to do this.
Thinking about this a bit more, it might be alright to detect and skip models that aren't fully downloaded, and cache the model_type configs based on the HF scan_cache_dir commit_hash return value in a ~/.mlx or a ~/.cache/.mlx directory. (Which could also be set up as the default location for converted/local models.)
This would solve the limitations around the model_type scanning mentioned above, and that locally converted models won’t show up (unless they are uploaded to HF with ‘mlx’ in the title). I'm not sure about others, but I would be much more likely to convert models if I had easy access to them with mlx_lm.server.
This is a much broader change than the scope of the initial PR though, so would probably need further discussion. @awni do you have any thoughts?
I think we can do this in stages. So starting with this PR and then possibly adding caching / support for model_type if as a second step.
I will review this as a starting point.
Regarding the next step, to me the main question is if it should support not-yet-downloaded models. If the answer is yes then maybe we want to use the HF Hub API to query for model_type. If the answer is no, then we should find a way to just get model types for already downloaded models. As for caching that's an optimization we may use in either case depending on the speed.
This pull request introduces a new /v1/models endpoint to mlx_lm.server. The purpose of this addition is to improve compatibility with clients such as Open WebUI, which expect to retrieve a list of available models before use.
Changes:
I've tested Open WebUI compatibility and this now works.