opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
88 stars 125 forks source link

[BUG] host must not be null #2633

Closed sonic182 closed 1 week ago

sonic182 commented 2 weeks ago

What is the bug? I'm trying to use an external model for text embeddings and I'm getting "host must not be null"

How can one reproduce the bug?

Please see this jupyter notebook: https://github.com/sonic182/sample_os_error/blob/main/opensearch_remote_model.ipynb

reference os server (docker-compose): https://github.com/sonic182/sample_os_error/blob/main/docker-compose.yml

What is the expected behavior?

Host must be detected based on URL of the connector

What is your host/environment?

Do you have any screenshots? An stack trace:

[2024-07-10T10:27:56,853][INFO ][o.o.m.t.MLPredictTaskRunner] [d377c6376e33] Auto deployment action triggered for the model Model ID: PMMvnJABV_DL0Puiawh4
[2024-07-10T10:29:13,876][ERROR][o.o.m.e.a.r.HttpJsonConnectorExecutor] [d377c6376e33] Fail to execute http connector
java.lang.NullPointerException: host must not be null.
        at software.amazon.awssdk.utils.Validate.paramNotNull(Validate.java:156) ~[utils-2.25.40.jar:?]
        at software.amazon.awssdk.http.DefaultSdkHttpFullRequest.<init>(DefaultSdkHttpFullRequest.java:56) ~[http-client-spi-2.25.40.jar:?]
        at software.amazon.awssdk.http.DefaultSdkHttpFullRequest.<init>(DefaultSdkHttpFullRequest.java:44) ~[http-client-spi-2.25.40.jar:?]
        at software.amazon.awssdk.http.DefaultSdkHttpFullRequest$Builder.build(DefaultSdkHttpFullRequest.java:482) ~[http-client-spi-2.25.40.jar:?]
        at software.amazon.awssdk.http.DefaultSdkHttpFullRequest$Builder.build(DefaultSdkHttpFullRequest.java:250) ~[http-client-spi-2.25.40.jar:?]
        at org.opensearch.ml.engine.algorithms.remote.ConnectorUtils.buildSdkRequest(ConnectorUtils.java:309) ~[opensearch-ml-algorithms-2.15.0.0.jar:?]
        at org.opensearch.ml.engine.algorithms.remote.HttpJsonConnectorExecutor.invokeRemoteService(HttpJsonConnectorExecutor.java:101) [opensearch-ml-algorithms-2.15.0.0.jar:?]
        at org.opensearch.ml.engine.algorithms.remote.RemoteConnectorExecutor.preparePayloadAndInvoke(RemoteConnectorExecutor.java:215) [opensearch-ml-algorithms-2.15.0.0.jar:?]
        at org.opensearch.ml.engine.algorithms.remote.RemoteConnectorExecutor.executeAction(RemoteConnectorExecutor.java:88) [opensearch-ml-algorithms-2.15.0.0.jar:?]
        at org.opensearch.ml.engine.algorithms.remote.RemoteModel.asyncPredict(RemoteModel.java:73) [opensearch-ml-algorithms-2.15.0.0.jar:?]
        at org.opensearch.ml.task.MLPredictTaskRunner.runPredict(MLPredictTaskRunner.java:344) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.predict(MLPredictTaskRunner.java:316) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
        at org.opensearch.ml.task.MLPredictTaskRunner.lambda$executeTask$8(MLPredictTaskRunner.java:260) [opensearch-ml-2.15.0.0.jar:2.15.0.0]
        at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:882) [opensearch-2.15.0.jar:2.15.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) [?:?]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) [?:?]
        at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
[2024-07-10T10:29:13,877][WARN ][r.suppressed             ] [d377c6376e33] path: /_plugins/_ml/models/PMMvnJABV_DL0Puiawh4/_predict, params: {model_id=PMMvnJABV_DL0Puiawh4}

Do you have any additional context?

The "ai_inference" host is another docker container in the same network, for testing in local...

ylwu-amzn commented 2 weeks ago

From your notebook

  connector_res = do_request("POST", "/_plugins/_ml/connectors/_create", 
  {
    "name": "Local app connector",
    "description": "The connector",
    "version": 1,
    "protocol": "http",
    "actions": [
      {
        "action_type": "predict",
        "method": "POST",
        "url": "http://ai_inference:8080/invocations",
        "headers": {
          "content-type": "application/json"
        },
        "post_process_function": "connector.post_process.default.embedding",
        "request_body": "{ \"text\": ${parameters.input} }",
      }
    ]
  })

The connector is using http://ai_inference:8080/invocations. @sonic182 Can you confirm this URL is correct and can be invoked insider your OpenSearch docker?

sonic182 commented 2 weeks ago

Hi @ylwu-amzn

Yes, the service is running a python http server with a model, same docker network in port 8080

Snippet of the service in docker

services:
  ai_inference:
    build: .
    volumes:
      - ./inference_app/:/opt/app/
      - models:/models

It doesn't receive any requests

ylwu-amzn commented 2 weeks ago

Can you verify if you can call the URL http://ai_inference:8080/invocations directly inside your OpenSearch docker with curl?

zane-neo commented 2 weeks ago

@sonic182 This is a JDK bug which doesn't support underscore in host name: https://bugs.openjdk.org/browse/JDK-8221675, if possible, please change your host name to w/o underscore and should able to solve this.

sonic182 commented 2 weeks ago

@sonic182 This is a JDK bug which doesn't support underscore in host name: https://bugs.openjdk.org/browse/JDK-8221675, if possible, please change your host name to w/o underscore and should able to solve this.

Thanks @zane-neo, now it works, I've just changed from underscore to dash "ai-inference" :+1:

This may be fixed inside opensearch? by using a custom dns resolver maybe?

sonic182 commented 2 weeks ago

Well the error is not in dns, is in the URL class

Maybe the Workaround mentioned in the bug could be applied here -> https://github.com/opensearch-project/ml-commons/blob/0b9708b831f87ac6eb35b5c3039c61afe9425d26/ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/remote/ConnectorUtils.java#L295

zane-neo commented 1 week ago

@sonic182 In OpenSearch we're using aws sdk client which accepts URI as parameter, so we're not going to implement the workaround.

zane-neo commented 1 week ago

Closing this issue since this has been resolved.