opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
95 stars 129 forks source link

Add AI connector blueprints for Aleph Alpha luminous-base embedding model #1925

Closed ulan-yisaev closed 6 months ago

ulan-yisaev commented 9 months ago

Is your feature request related to a problem? I'm proposing to add an AI connector blueprint for the Aleph Alpha Luminous-Base Embedding Model to the current collection of remote inference blueprints in OpenSearch ML Commons.

This model is particularly effective for German language applications, providing nuanced and contextually relevant embeddings. Given the increasing demand for robust language model solutions in different languages, integrating this model could significantly enhance your offerings for German-language processing tasks.

What solution would you like? A markdown file of AI connector blueprint for Aleph Alpha luminous-base embedding model.

What alternatives have you considered? I was able to write one using existing blueprints here:

{
  "name": "Aleph Alpha Connector: luminous-base, representation: document",
  "description": "The connector to the Aleph Alpha luminous-base embedding model with representation: document",
  "version": "0.1",
  "protocol": "http",
  "parameters": {
    "endpoint": "api.aleph-alpha.com",
    "representation": "document",
    "normalize": true
  },
  "credential": {
    "AlephAlpha_API_Token": "XXXXXXXXXXXXXXXXXX"
  },
  "actions": [
    {
      "action_type": "predict",
      "method": "POST",
      "url": "https://${parameters.endpoint}/semantic_embed",
      "headers": {
        "Content-Type": "application/json",
        "Accept": "application/json",
        "Authorization": "Bearer ${credential.AlephAlpha_API_Token}"
      },
      "request_body": "{ \"model\": \"luminous-base\", \"prompt\": \"${parameters.input}\", \"representation\": \"${parameters.representation}\", \"normalize\": ${parameters.normalize}}",
      "pre_process_function": "\n    StringBuilder builder = new StringBuilder();\n    builder.append(\"\\\"\");\n    String first = params.text_docs[0];\n    builder.append(first);\n    builder.append(\"\\\"\");\n    def parameters = \"{\" +\"\\\"input\\\":\" + builder + \"}\";\n    return  \"{\" +\"\\\"parameters\\\":\" + parameters + \"}\";",
      "post_process_function": "\n      def name = \"embedding\";\n      def dataType = \"FLOAT32\";\n      if (params.embedding == null || params.embedding.length == 0) {\n        return params.message;\n      }\n      def shape = [params.embedding.length];\n      def json = \"{\" +\n                 \"\\\"name\\\":\\\"\" + name + \"\\\",\" +\n                 \"\\\"data_type\\\":\\\"\" + dataType + \"\\\",\" +\n                 \"\\\"shape\\\":\" + shape + \",\" +\n                 \"\\\"data\\\":\" + params.embedding +\n                 \"}\";\n      return json;\n    "
    }
  ]
}
saratvemulapalli commented 9 months ago

Thanks @ulan-yisaev. Do you want to contribute the change in a PR?

ulan-yisaev commented 9 months ago

Hi @saratvemulapalli , Sure thing, I'll be happy to contribute.

ramda1234786 commented 8 months ago

Hi @ulan-yisaev i see you have used embedding models outside of Cohere, Bedrock and OpenAI. I had been trying similar for Hugging Face text generation model to do post_process_function but not able to do that. Any idea how you can achieve this post_process_function ?

I have this

[
    {
        "generated_text": "Your Generated text"
    }
]

and i want to convert it to this below using post process fuction

    {
        "completion": "Your Generated text"
    }

I have tried this till now

"post_process_function": "\n def json = \"{\" +\n \"\\\"completion\\\":\\\"\" + params['response'][0].generated_text + \"\\\" }\";\n return json;\n "

Also @saratvemulapalli if you have any idea on this

ulan-yisaev commented 8 months ago

Hi @ramda1234786 , Please note that I haven't tested generation models, as my work primarily focuses on embedding models. But I suppose you could try the following function:

"post_process_function": "\n def generatedText = params.response[0].generated_text;\n def json = \"{\\\"completion\\\":\\\"\" + generatedText + \"\\\"}\";\n return json;\n"

ramda1234786 commented 8 months ago

Thanks for your response @ulan-yisaev . I tried below but no luck

I get this from predict API without post_process_function

{
    "inference_results": [
        {
            "output": [
                {
                    "name": "response",
                    "dataAsMap": {
                        "response": [
                            {
                                "generated_text": "The title is Rush, year is 2013, budget is 500000, earning is 300000, genere is action.What is the budget of Rush?\n\nThe budget of Rush is 500000...................."
                            }
                        ]
                    }
                }
            ],
            "status_code": 200
        }
    ]
}

but when i add post_process_function as you mentioned , i get this below error

{
    "error": {
        "root_cause": [
            {
                "type": "script_exception",
                "reason": "runtime error",
                "script_stack": [
                    "generatedText = params.response[0].generated_text;\n def ",
                    "                      ^---- HERE"
                ],
                "script": " ...",
                "lang": "painless",
                "position": {
                    "offset": 28,
                    "start": 6,
                    "end": 62
                }
            }
        ],
        "type": "script_exception",
        "reason": "runtime error",
        "script_stack": [
            "generatedText = params.response[0].generated_text;\n def ",
            "                      ^---- HERE"
        ],
        "script": " ...",
        "lang": "painless",
        "position": {
            "offset": 28,
            "start": 6,
            "end": 62
        },
        "caused_by": {
            "type": "null_pointer_exception",
            "reason": "Cannot invoke \"Object.getClass()\" because \"callArgs[0]\" is null"
        }
    },
    "status": 400
}

Not sure how to fix this.....

mashah commented 8 months ago

Version 2.12 is still under development. If you need RAG with OpenSearch my recommendation is to try Sycamore. We know that path works, though it's using 2.11. Once 2.12 is ready, we will have that working with Sycamore as well.

HenryL27 commented 6 months ago

Closing as the connector blueprint was added