Currently, when models are served via the "Launch" tab of the MLX UI, the container port is fixed and assumed to be 5000. This works for the containerized MAX models, all of which serve inferencing requests on port 5000. However any other models to be registered in MLX that are not listening on port 5000 would have to be rebuild before they can be registered and deployed in MLX.
This PR introduces a new optional field to the model YAML to allow users to specify the port on which the containerized model listens to inferencing requests.
Needs approval from an approver in each of these files:
- ~~[OWNERS](https://github.com/machine-learning-exchange/katalog/blob/main/OWNERS)~~ [Tomcli,ckadner]
Approvers can indicate their approval by writing `/approve` in a comment
Approvers can cancel approval by writing `/approve cancel` in a comment
Currently, when models are served via the "Launch" tab of the MLX UI, the container port is fixed and assumed to be
5000
. This works for the containerized MAX models, all of which serve inferencing requests on port 5000. However any other models to be registered in MLX that are not listening on port 5000 would have to be rebuild before they can be registered and deployed in MLX.This PR introduces a new optional field to the model YAML to allow users to specify the port on which the containerized model listens to inferencing requests.
There is PR
#371
on themlx
repo to make use of thecontainer_port
parameter from the model YAML in themodel-config
component that is used in the model deployment pipeline in MLX./cc @Tomcli @rafvasq