opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
99 stars 136 forks source link

[Feature] Does OpenSearch support complex deep-learning model? #716

Open YuanBoXie opened 1 year ago

YuanBoXie commented 1 year ago

In the document ML commons shows how to upload a model and host this model by OpenSearch. It just gives a use case of SBERT written by TorchScript. When using this model for prediction, it gives an example:

POST /_plugins/_ml/_predict/text_embedding/WWQI44MBbzI2oUKAvNUt
{
"text_docs":["today is sunny"],
"return_number": true,
"target_response": ["sentence_embedding"]
}

In this case, seems OpenSearch just supports text embedding, and the model input needs to use the text_docs key to input data. The text_docs key seems just to support a list of strings, and can't handle complex data structures such as JSON objects. But nowadays modern AI models always need complex data structures, eg. custom JSON objects for GNN model input, and a 2-dimension/3-dimension array for CV model input...

Can OpenSearch handle these cases described before?

YuanBoXie commented 1 year ago

If OpenSearch really can handle these cases, could you please give me some examples? I feel really confused about this feature.

ylwu-amzn commented 1 year ago

@hexbo Thanks for you interest on ml-commons. Currently we only support text embedding model (refer to this doc). Can you explain more details about your use cases? Like what's your model type, expected input/output, how will you expect to use your model in OpenSearch etc.

YuanBoXie commented 1 year ago

Thanks for your reply @ylwu-amzn. Here are some details about my use case:

Model: Graph Neural Network Model Input: JSON object format that describes an attribute graph, including vertices set and edges set in the graph. eg.

{
    "vertex": [
        [12.1,31.3,123.2,1.3],
        [2.4,5.3,2.1,10.0],
    ],
    "edge": [
        [1],
        []
    ]
}

This is a directed graph with one edge and two vertices. For each vertex, it contains a feature vector, such as [12.1,31.3,123.2,1.3].The key of "edge" describes the link relationship between two vertices. Here it means a directed edge in which the source is 0 and the target is 1. Then GNN calculates the graph embedding for this graph.

Actually, the graph described in JSON above needs adjacency matrix form before being input into the model. In total it contains two input matrices, one is about graph attributes, and the other one is an adjacency matrix.

// graph attributes
[
    [12.1,31.3,123.2,1.3],
    [2.4,5.3,2.1,10.0],
]
// adjacency matrix: the mask
[
    [0,1]
    [0,0]
]

Expected output: the graph embedding(64 dimension float vector or higher)

How to use:

  1. upload JSON data to OpenSearch
  2. OpenSearch do preprocess(transform JSON to matrix)
  3. calc graph embedding: input matrix, use the loaded model calc the output
  4. return graph embedding which can be used for neural search latter
ylwu-amzn commented 1 year ago

@hexbo Thanks this is a good suggestion. Do you have other use cases/models except for GNN ? We are improving this feature to support more types of models. Appreciate if you can share more use cases