Open zane-neo opened 1 month ago
@b4sjoo Has built update connector API which doesn't need redeploy model for internal connector. Sicheng, can you help take this to support standalone connector too?
With auto deploy for remote models, this can be easily done as follow:
After UpdateConnector is done and when the connector is used by any model, the model will be auto-deployed with the updated connector. People may ask why my model is un-deployed once the connector is updated? Because you have a very important metadata updated which means you model has changed so we un-deployed your model. However, it doesn't introduce any downtime or disturb how you can use your model. From the users point of view, the availability and usability remain the same.
Related: https://github.com/opensearch-project/ml-commons/issues/1148, https://github.com/opensearch-project/ml-commons/issues/2376
With auto deploy for remote models, this can be easily done as follow:
- Update the connector and save the new connector meta into the ml-connector index. (already in the current API)
- Undeploy the models that are associated with the connector. (single line change)
After UpdateConnector is done and when the connector is used by any model, the model will be auto-deployed with the updated connector. People may ask why my model is un-deployed once the connector is updated? Because you have a very important metadata updated which means you model has changed so we un-deployed your model. However, it doesn't introduce any downtime or disturb how you can use your model. From the users point of view, the availability and usability remain the same.
Related: #1148, #2376
Is there any possibility that the model's un-deploy and auto deploy happens at the same time causing any unexpected status?
The "unexpected status" is too general so it's hard to imagine all edge cases or racing conditions to happen. We need to state it clearly that it's not recommended to predict a model when you are in middle of updating the connector. Before the un-deploy finishes, auto-deploy will not happen because old models are still in the memory so predictions are still based on the old model if you predict a model while updating the connector.
That doesn't seem a good user experience, if user is updating some http client related parameters, e.g. connection timeout, the good user experience would be: the prediction can keep happening and for the instances that received this update, the afterward predictions honer the updated connection timeout setting, for the instances that haven't received this update, the predictions honer the old connection timeout setting. The thing is currently updating connector can cause data loss on production since it requires to un-deploy the model, so our purpose should be avoiding this to give user seamlessly experience, without data loss.
It will not have data loss. The auto-deploy will refresh the new connector for you with updated params. It only may introduce data inconsistency in a short time window.
Is your feature request related to a problem? Currently the update connector API checks all the usage of it and only when no model using it, then update operation can go through, but this doesn't seem reasonable, especially for remote models.
What solution would you like? Adding a new parameter like
redeploy_model=true
in the url param can reduce the manual effort to undeploy/deploy the model.What alternatives have you considered? Change the default behavior to automatically redeploy the model after connector updated.
Do you have any additional context? Add any other context or screenshots about the feature request here.