Add /v1/models endpoint to mlx_lm.server

ml-explore / mlx-examples

Examples in the MLX framework

MIT License

6.3k stars 898 forks source link

Add /v1/models endpoint to mlx_lm.server #984

Closed jamesm131 closed 2 months ago

jamesm131 commented 2 months ago

This pull request introduces a new /v1/models endpoint to mlx_lm.server. The purpose of this addition is to improve compatibility with clients such as Open WebUI, which expect to retrieve a list of available models before use.

Changes:

Implemented GET request handling for the /v1/models route
Added a method to return information about the loaded model
Added tests for the /v1/models route

I've tested Open WebUI compatibility and this now works.

jamesm131 commented 2 months ago

Okay I've made these updates. The linked manage.py file uses the mlx pattern by default to search for models so I've done that here. I tried doing something like this to check the model_type in the config, but this took a long time to run, and if a model is only partially downloaded it would trigger a full download.

        model_configs = [
            zip(repo.repo_id, load_config(get_model_path(repo.repo_id))["model_type"]) for repo in hf_cache_info.repos
        ]
        folder_path = Path("mlx_lm.models")
        model_list = [f.stem for f in folder_path.glob('*.py')]

        downloaded_models = [model  for model, model_type in model_configs if model_type in model_list]

I couldn't see another way to easily check the model_type, so just went with the 'mlx' check. Happy to modify if there is a better way to do this.

jamesm131 commented 2 months ago

Thinking about this a bit more, it might be alright to detect and skip models that aren't fully downloaded, and cache the model_type configs based on the HF scan_cache_dir commit_hash return value in a ~/.mlx or a ~/.cache/.mlx directory. (Which could also be set up as the default location for converted/local models.)

This would solve the limitations around the model_type scanning mentioned above, and that locally converted models won’t show up (unless they are uploaded to HF with ‘mlx’ in the title). I'm not sure about others, but I would be much more likely to convert models if I had easy access to them with mlx_lm.server.

This is a much broader change than the scope of the initial PR though, so would probably need further discussion. @awni do you have any thoughts?

awni commented 2 months ago

I think we can do this in stages. So starting with this PR and then possibly adding caching / support for model_type if as a second step.

I will review this as a starting point.

Regarding the next step, to me the main question is if it should support not-yet-downloaded models. If the answer is yes then maybe we want to use the HF Hub API to query for model_type. If the answer is no, then we should find a way to just get model types for already downloaded models. As for caching that's an optimization we may use in either case depending on the speed.