Open b4sjoo opened 1 year ago
@b4sjoo This is a great idea, love it!
In the case where the alias namespace gets crowded, it would be very helpful to have some kind of search-aliases API to see what names are taken, and perhaps where they point so I don't end up duplicating these unnecessarily.
Another option to consider could be to have alias names be unique on some kind of access-control basis: it could be very frustrating to try to create aliases but then be unable because someone's already done so in an access level above you. I'm not sure that's a real thing, but something to think about.
Can you explain the LLD update-alias in more detail? I'm having a little trouble understanding. Can a model have multiple aliases associated with it?
Thanks
Will connectors get aliases, too?
+1 this will be a great user experience and will help with model versioning and MLClient interactions.
Couple of suggestions and questions:
Global uniqueness check: we check both on model ID and model aliases.
Option 1: The model alias is globally unique
Did you consider other options? Like unique model aliases per model group. Though I like the approach of globally unique aliases, it would be nice to compare alternatives.
With introduction of model access control[1], control is operated on model groups. How does it work with aliases being introduced?
model_id
is mandatory today, how do we handle when both model_id
and model_alias
are present? It would be good to clarify that.
"query": {
"neural": {
"passage_vector": {
"query_text": "Hello world",
"model_id": "xzy76xswsd",
"model_alias": "neural_search_text_embedding",
"k": 100
}
}
},
Where model id is being used now
[1] https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_access_control.md
Will connectors get aliases, too?
I think we can add alias for connector too. @austintlee , Will it be useful? Any concern?
What if the model_id could just be manually specified instead of randomly generated?
Instead of creating alias functionality around the ID, you could just specify the ID of your choosing when you create the various objects. If the ID attribute is left off, then it's auto-generated as it is today, but when specified the ID is used. Users could then decide if they want to use a "text-friendly" ID (e.g. "neural_search_text_embedding_model"), a UID of their own or the one OpenSearch creates.
Would that simplify things?
There's probably some technical reasons why this might not be feasible (and why it wasn't done from the start), but I'm just wondering if this wouldn't simplify things if there are no technical reasons why the ID cannot just be specified during creation.
+1 for this feature.
I'm trying to work on scripts so we can provision our servers. We deploy to varies different environments, so having IDs that are consistent across deployments makes the chain much easier to manage, especially since our indices are created on-the-fly.
Right now, there's no simple solution to referencing the model workflow, since the IDs will be different on every instance. The application stack needs to search of the registered models by name and find their IDs and then store that for use in the application and then we need to look it up any time we need to run some one-off queries.
Having the ability to use an "alias" (or define our own ID) would definitely simplify our workflow.
If you need more details on our provisioning workflow issue, there's a thread here: https://forum.opensearch.org/t/provisioning-models-for-deployment-in-application/17036
+1 for the problem -1 for the solution
I just posted something like this on the slack channel: https://opensearch.slack.com/archives/C05BGJ1N264/p1718823006253339
why not make model
name unique in the same way that an index
is unique?
adding yet another layer of indirection -- alias -- seems to be going in the wrong direction, which is ease-of-use
if you're going to add aliases, why not just use @ notation for them like "model" : "@mymodelalias"
or mustache
"{{{mymodelalias}}}"
. then you can use them anywhere, like for connections and indexes.
Thanks @rbpasker ,
why not make model name unique in the same way that an index is unique?
This is some historical issue. The model name was designed not as identifier (no uniqueness check) at first. Considering we have many users are using this and they may have models which same name. If we change the model name design to be an identifier, will be challenging to support backward compatible (for example, user have two models with same name, if they use same model nam, we don't know which model should be used). Adding an alias is easier for supporting backward compatible.
I agree with @rbpasker the current proposed solution will add more friction and complexity.
will be challenging to support backward compatible (for example, user have two models with same name, if they use same model nam, we don't know which model should be used). Adding an alias is easier for supporting backward compatible.
easier != the best solution
Why not something like this?: if models.get(name).size > 1 then fail("there are 2 models with the same name please update accordingly) else do whatever
What purpose does a model name even serve, if it is not unique, is never used by the system anywhere else,and there is a separate model description?
Alternatively, make model name optionally unique. With no change to the model definition, it works exactly as now. By adding "model-unique":true
to the model definition, system would do a uniqueness check, and return HTTP 409 Conflict if it already exists https://www.rfc-editor.org/rfc/rfc7231#section-6.5.8
This has the benefits of backward compatibility and having the smallest surface area for implementation and testing hi
@rbpasker Thanks for your suggestion. Sounds an option.
Want to clarify some details
By adding "model-unique":true , system would do a uniqueness check, and return HTTP 409 Conflict if it already exists
Do you mean add another field model-unique
in model or add a cluster setting to check uniqueness for all models?
Adding a field to model will be "smallest surface area for implementation and testing".
@ylwu-amzn i updated the comment to refer specifically to the model definition.
Introduction:
Right now our model versions are identified by random IDs. Meanwhile, most of our ML-related behaviors at this time require user to refer the model ID directly, which are hard to remember and use due to its abstractness. This can lead to confusion and errors when the users want to refer to a specific model or compare different models. To address this issue, we propose to design and implement a model alias feature for our ml-commons. This feature will allow users to assign a custom alias to each model they create or use, and then use this alias as a reference instead of the model ID. For example, a user can name their model "text_embedding" or "sentiment_analysis" and then use these alias in their queries or commands.
Benefits of this feature:
Where model id is being used now:
Neural search query also needs model id, for example
Pain point
Suppose the user have a new model version and they want to use it to replace the old version. They need to update the pipeline by changing the
model_id
. User also need to change model id in neural search query.Another example: User build some application with OpenAI remote model, and later they prefer to move to another model like Claude model, then user have to change the model id in their application code and redeploy. That will have service unavailable window. With model alias, user can simply move alias from one model to another, then all request will be routed to the new model, no need to redeploy, no service unavailable window.
Solution HLD:
Option 1: The model alias is globally unique
Plan:
pros:
cons:
Solution LLD:
Plan:
Security
Backward Compatibility
Testability