opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
88 stars 126 forks source link

[FEATURE] Improve user experience when running in environments with limited internet access #1902

Open ArranDengate-Netapp opened 7 months ago

ArranDengate-Netapp commented 7 months ago

For clusters in a corporate setting, internet access is often restricted with an egress firewall.

However, the ML commons plugin needs internet access to download dependencies, even when using a local model.

It would be good to improve the user experience in this situation. Some ideas:

I see this behaviour when using the all-MiniLM-L12-v2 model locally on OpenSearch 2.11.1, using the TorchScript model file and config from the list of pre-trained models, deploying from a local zip file with the steps from opensearch-py-ml's demo notebook. I have made some suggestions based on my experience below, but I'm not sure if the ONNX model would have different dependencies than the TorchScript model, or if other models have different dependencies (eg, whether all-mpnet-base-v2 is going to have different dependencies than all-MiniLM-L12-v2).

Packaging

When using a local Torch model on a server with restricted internet access, deploying the model fails if the server cannot access publish.djl.ai. In ml-commons code, this URL is mentioned by the pytorch-engine library.

It might be possible to package a fat jar with dependencies to avoid this issue? This was previously discussed in the OpenSearch forums.

Documentation

It would be useful to document:

Currently, the plugin appears to need network access to the following URLs when deploying, even when using a local model:

Logging

Another way to improve this experience would be to log more information when there is a failure downloading dependencies.

When deploying a local model, if an egress firewall is configured to drop packets to destinations that are not explicitly permitted, we get an error that doesn't tell us which destination we were trying to reach - from this, it is not obvious what address needs to be whitelisted. Here are the OpenSearch logs when deploying a local model under these circumstances:

[2024-01-23T00:10:53,793][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 1
[2024-01-23T00:10:54,582][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 2
[2024-01-23T00:10:55,342][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 3
[2024-01-23T00:10:55,922][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 4
[2024-01-23T00:10:56,444][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 5
[2024-01-23T00:10:56,997][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 6
[2024-01-23T00:10:57,481][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 7
[2024-01-23T00:10:57,840][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 8
[2024-01-23T00:10:58,215][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 9
[2024-01-23T00:10:58,612][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 10
[2024-01-23T00:10:58,988][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 11
[2024-01-23T00:10:59,399][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 12
[2024-01-23T00:10:59,786][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 13
[2024-01-23T00:10:59,977][INFO ][o.o.m.a.u.MLModelChunkUploader] [ip-172-31-62-254.ec2.internal] Index model successful for 786nM40BUDoVia3UznyW for chunk number 14
[2024-01-23T00:11:00,014][INFO ][o.o.m.a.d.TransportDeployModelAction] [ip-172-31-62-254.ec2.internal] Will deploy model on these nodes: Q6DHrMfSTRyIEHJNDCnCsw
[2024-01-23T00:11:04,963][WARN ][a.d.u.c.CudaUtils        ] [ip-172-31-62-254.ec2.internal] Access denied during loading cudart library.
[2024-01-23T00:11:29,623][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:11:39,584][INFO ][o.o.i.i.ManagedIndexCoordinator] [ip-172-31-62-254.ec2.internal] Cancel background move metadata process.
[2024-01-23T00:11:39,585][INFO ][o.o.i.i.ManagedIndexCoordinator] [ip-172-31-62-254.ec2.internal] Performing move cluster state metadata.
[2024-01-23T00:11:39,585][INFO ][o.o.i.i.MetadataService  ] [ip-172-31-62-254.ec2.internal] Move metadata has finished.
[2024-01-23T00:11:39,618][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOYING}
[2024-01-23T00:11:59,623][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:12:09,622][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOYING}
[2024-01-23T00:12:29,621][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:12:39,625][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOYING}
[2024-01-23T00:13:09,624][INFO ][o.o.m.c.MLSyncUpCron     ] [ip-172-31-62-254.ec2.internal] Refresh model state: {786nM40BUDoVia3UznyW=DEPLOY_FAILED}
[2024-01-23T00:13:14,922][ERROR][o.o.m.e.a.DLModel        ] [ip-172-31-62-254.ec2.internal] Failed to deploy model 786nM40BUDoVia3UznyW
ai.djl.engine.EngineException: Failed to save pytorch index file
  at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:403) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[api-0.21.0.jar:?]
  at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at java.security.AccessController.doPrivileged(AccessController.java:569) [?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:242) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:138) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1003) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
  at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$58(MLModelManager.java:1123) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
  at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.11.1.jar:2.11.1]
  at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.1.jar:2.11.1]
  at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.1.jar:2.11.1]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.ConnectException: Connection timed out
  at sun.nio.ch.Net.connect0(Native Method) ~[?:?]
  at sun.nio.ch.Net.connect(Net.java:579) ~[?:?]
  at sun.nio.ch.Net.connect(Net.java:568) ~[?:?]
  at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:593) ~[?:?]
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
  at java.net.Socket.connect(Socket.java:633) ~[?:?]
  at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[?:?]
  at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:174) ~[?:?]
  at sun.net.NetworkClient.doConnect(NetworkClient.java:183) ~[?:?]
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:533) ~[?:?]
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:638) ~[?:?]
  at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[?:?]
  at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[?:?]
  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[?:?]
  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[?:?]
  at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
  at java.net.URL.openStream(URL.java:1161) ~[?:?]
  at ai.djl.util.Utils.openUrl(Utils.java:461) ~[api-0.21.0.jar:?]
  at ai.djl.util.Utils.openUrl(Utils.java:445) ~[api-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) ~[pytorch-engine-0.21.0.jar:?]
  ... 22 more
[2024-01-23T00:13:14,969][ERROR][o.o.m.m.MLModelManager   ] [ip-172-31-62-254.ec2.internal] Failed to retrieve model 786nM40BUDoVia3UznyW
org.opensearch.ml.common.exception.MLException: Failed to deploy model 786nM40BUDoVia3UznyW
  at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:289) ~[?:?]
  at java.security.AccessController.doPrivileged(AccessController.java:569) ~[?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:242) ~[?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:138) ~[?:?]
  at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) ~[?:?]
  at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1003) ~[?:?]
  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
  at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$58(MLModelManager.java:1123) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
  at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.11.1.jar:2.11.1]
  at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.1.jar:2.11.1]
  at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.1.jar:2.11.1]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: ai.djl.engine.EngineException: Failed to save pytorch index file
  at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:403) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[?:?]
  at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[?:?]
  at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[?:?]
  at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) ~[?:?]
  ... 14 more
Caused by: java.net.ConnectException: Connection timed out
  at sun.nio.ch.Net.connect0(Native Method) ~[?:?]
  at sun.nio.ch.Net.connect(Net.java:579) ~[?:?]
  at sun.nio.ch.Net.connect(Net.java:568) ~[?:?]
  at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:593) ~[?:?]
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
  at java.net.Socket.connect(Socket.java:633) ~[?:?]
  at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[?:?]
  at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:174) ~[?:?]
  at sun.net.NetworkClient.doConnect(NetworkClient.java:183) ~[?:?]
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:533) ~[?:?]
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:638) ~[?:?]
  at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[?:?]
  at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[?:?]
  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[?:?]
  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[?:?]
  at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
  at java.net.URL.openStream(URL.java:1161) ~[?:?]
  at ai.djl.util.Utils.openUrl(Utils.java:461) ~[?:?]
  at ai.djl.util.Utils.openUrl(Utils.java:445) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[?:?]
  at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[?:?]
  at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[?:?]
  at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[?:?]
  at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) ~[?:?]
  ... 14 more
[2024-01-23T00:13:14,981][ERROR][o.o.m.a.f.TransportForwardAction] [ip-172-31-62-254.ec2.internal] deploy model failed on all nodes, model id: 786nM40BUDoVia3UznyW
[2024-01-23T00:13:14,981][INFO ][o.o.m.a.f.TransportForwardAction] [ip-172-31-62-254.ec2.internal] deploy model done with state: DEPLOY_FAILED, model id: 786nM40BUDoVia3UznyW
[2024-01-23T00:13:14,983][INFO ][o.o.m.a.d.TransportDeployModelOnNodeAction] [ip-172-31-62-254.ec2.internal] deploy model task done 8M6nM40BUDoVia3U7nw0

Under this circumstance, GET /_plugins/_ml/models/<model-id> tells us the deploy failed, but does not provide a reason. (Not sure if the task API would provide more info - I couldn't see how to get opensearch-py-ml to give me the task ID.)

{
  "name": "sentence-transformers/all-MiniLM-L12-v2",
  "model_group_id": "pWZUEo0BgFhXOXZgeEi_",
  "algorithm": "TEXT_EMBEDDING",
  "model_version": "11",
  "model_format": "TORCH_SCRIPT",
  "model_state": "DEPLOY_FAILED",
  "model_content_size_in_bytes": 134568911,
  "model_content_hash_value": "f8012a4e6b5da1f556221a12160d080157039f077ab85a5f6b467a47247aad49",
  "model_config": {
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "SENTENCE_TRANSFORMERS",
    "all_config": "{\"_name_or_path\":\"microsoft/MiniLM-L12-H384-uncased\",\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":12,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}"
  },
  "created_time": 1705968651923,
  "last_updated_time": 1705968794982,
  "last_deployed_time": 1705968794981,
  "total_chunks": 14,
  "planning_worker_node_count": 1,
  "current_worker_node_count": 0,
  "planning_worker_nodes": [
    "Q6DHrMfSTRyIEHJNDCnCsw"
  ],
  "deploy_to_all_nodes": true
}

Please note, the above is assuming that DNS is permitted. If the egress firewall is also preventing DNS, the error is more useful and does contain the domain that needs to be whitelisted:

[2024-01-18T05:41:30,534][ERROR][o.o.m.e.a.DLModel        ] [ip-172-31-58-14.ec2.internal] Failed to deploy model W9EWG40Blv3ldtU8hMVo
ai.djl.engine.EngineException: Failed to save pytorch index file
  at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:403) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:286) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.getLibTorch(LibUtils.java:89) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:77) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:53) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) ~[pytorch-engine-0.21.0.jar:?]
  at ai.djl.engine.Engine.getEngine(Engine.java:187) ~[api-0.21.0.jar:?]
  at org.opensearch.ml.engine.algorithms.DLModel.doLoadModel(DLModel.java:185) ~[opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.engine.algorithms.DLModel.lambda$loadModel$1(DLModel.java:275) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at java.security.AccessController.doPrivileged(AccessController.java:569) [?:?]
  at org.opensearch.ml.engine.algorithms.DLModel.loadModel(DLModel.java:242) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.engine.algorithms.DLModel.initModel(DLModel.java:138) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.engine.MLEngine.deploy(MLEngine.java:125) [opensearch-ml-algorithms-2.11.1.0.jar:?]
  at org.opensearch.ml.model.MLModelManager.lambda$deployModel$52(MLModelManager.java:1003) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
  at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$58(MLModelManager.java:1123) [opensearch-ml-2.11.1.0.jar:2.11.1.0]
  at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.11.1.jar:2.11.1]
  at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.11.1.jar:2.11.1]
  at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.11.1.jar:2.11.1]
  at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.11.1.jar:2.11.1]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
  at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: java.net.UnknownHostException: publish.djl.ai
  at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:572) ~[?:?]
  at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
  at java.net.Socket.connect(Socket.java:633) ~[?:?]
  at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:304) ~[?:?]
  at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:174) ~[?:?]
  at sun.net.NetworkClient.doConnect(NetworkClient.java:183) ~[?:?]
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:533) ~[?:?]
  at sun.net.www.http.HttpClient.openServer(HttpClient.java:638) ~[?:?]
  at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:266) ~[?:?]
  at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:380) ~[?:?]
  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:193) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1242) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1128) ~[?:?]
  at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:179) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1665) ~[?:?]
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589) ~[?:?]
  at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224) ~[?:?]
  at java.net.URL.openStream(URL.java:1161) ~[?:?]
  at ai.djl.util.Utils.openUrl(Utils.java:461) ~[api-0.21.0.jar:?]
  at ai.djl.util.Utils.openUrl(Utils.java:445) ~[api-0.21.0.jar:?]
  at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) ~[pytorch-engine-0.21.0.jar:?]
  ... 22 more
saratvemulapalli commented 7 months ago

Thanks for opening up the issue @ArranDengate-Netapp. I was poking around with the demo[1], I see its using a pre-trained model being downloaded from artifacts.opensearch.org and obviously it just downloads the model but does not have an engine (Eg. pytorch-engine)

I see 3 feature enhancements:

  1. Can we make ML Commons run without access to network: i.e Have a way to to package and ship dependency libraries a.k.a fat jar. It makes sense, it involves work for packaging, signing these artifacts. We could take this as a feature request.
  2. Documentation: Document all network access domains for pre-trained models in artifacts.opensearch.org. I think its fair to document, we could do it in the repo[2] or opensearch documentation[3]
  3. Logging: Improve logging when connection timeout occurs to correctly display the Domain and DNS. We use standard java libraries, we should be able to print the exception stack trace hopefully that has the information we are looking for.

Under this circumstance, GET /_plugins/_ml/models/ tells us the deploy failed, but does not provide a reason. (Not sure if the task API would provide more info - I couldn't see how to get opensearch-py-ml to give me the task ID.)

It should be fairly straight forward to get task ID[4] from deploy model, the deploy model API returns a Task ID which you could query through the Task API[5].

That said, I am fairly new to this repo. I'd like to hear thoughts from other maintainers who are pretty active @ylwu-amzn @austintlee @HenryL27 .

[1] https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html [2] https://github.com/opensearch-project/ml-commons/tree/main/docs [3] https://opensearch.org/docs/latest/ml-commons-plugin/ [4] https://opensearch-project.github.io/opensearch-py-ml/examples/demo_ml_commons_integration.html#Step-2:-Load-Model [5] https://opensearch.org/docs/latest/ml-commons-plugin/api/tasks-apis/index/

ylwu-amzn commented 6 months ago

@ArranDengate-Netapp, thanks for cutting this issue.

We thought about this use case (cluster has no access to network) when we build the feature. One option we considered is bundling the dependencies to OpenSearch release, the challenge is we need to consider different hardware, different versions, also that will make the OpenSearch size much bigger. We didn't find other good options, so we did not prioritize this use case. We can pick up this topic and have more discussion, welcome any comments/suggestions.

One workaround :

  1. Set up a test single-node cluster with the same hardward setting of your production cluster, make sure this single-node cluster has network access.
  2. Then upload model to this test cluster, the dependencies will be downloaded to that single-node cluster.
  3. Then you can copy dependency from this test cluster to your production cluster.
ArranDengate-Netapp commented 6 months ago

@ylwu-amzn I see, that's a difficult tradeoff.

That workaround sounds good! I would like to check:

ArranDengate-Netapp commented 6 months ago

(Oops, didn't mean to close...)

brunowcs commented 6 months ago

@ArranDengate-Netapp Try adding a proxy and see if it works

Step 1: Edit /etc/sysconfig/opensearch Step 2: Add line OPENSEARCH_JAVA_OPTS="-Dhttp.proxyHost=YOURPROXY -Dhttp.proxyPort=YOURPORT -Dhttps.proxyHost=YOURPROXY -Dhttps.proxyPort=YOURPORT -Dhttp.nonProxyHosts=localhost|127.0.0.1|10...|.local" Step 3: Restart cluster

Let me know if it worked for you.

ArranDengate-Netapp commented 6 months ago

Hi @brunowcs ,

Wow, I didn't realise Java had built-in proxy support!

I don't think this approach will work for us, but this could be a useful workaround for other people affected by this issue. I am involved with two use-cases:

ylwu-amzn commented 2 months ago

@ylwu-amzn I see, that's a difficult tradeoff.

That workaround sounds good! I would like to check:

  • when you say copy dependency from the test cluster to the production cluster - would that just be the contents of the ml_cache directory? (eg, for the RPM install of OpenSearch: /var/lib/opensearch/ml_cache ?)
  • once the model has been uploaded and deployed, nothing else will need to be downloaded later, right? (That would make sense, I just want to confirm)
  • does the ML cache ever get cleared, in such a way that we would need to re-download the model?

For question1, yes, just copy the whole ml_cache directory For question2, correct, nothing else For question3, no, unless you manually delete the local cached file.

manzke commented 2 months ago

Hey, following up and merging #2165 into it. So running a node completely without internet is possible.

We are running our server in aws in a public private vpc setting. OS is in the private one and has no access. We are registering a model group and uploading our model into it.

This way OS needs no access at all.

What we can see is that even if we set offline flags for the underlying libraries they still try to download certain models. (Every time)

Only after all this failed, the whole tasks fails and a second later switches to deployed.

We have it quite often in our aws instance that the model gets dropped. Don't know why yet. Sometimes when I trigger the redeploy (we have it built into our app now, that if model gone, kick redeploy..) that after 10min, download tries, fail it gets back to deployed.

Sometimes I have to clean the model cache and upload it again.

We are on 2.12 right now. Trying to get to 2.14.

So the use case in general:

The last approach is what we do with vllm. We mount the model into the container, so it won't download it.

manzke commented 2 months ago

Btw just register a local model with a zip uploaded to the clusters filesystem. afterwards registering the model with the local file url. It worked nicely, but like before seeing things like: [2024-06-11T11:26:53,365][WARN ][a.d.h.z.HfModelZoo ] [ip-10-0-150-245.eu-central-1.compute.internal] Failed to download Huggingface model zoo index: NLP.FILL_MASK [2024-06-11T11:29:04,437][WARN ][a.d.h.z.HfModelZoo ] [ip-10-0-150-245.eu-central-1.compute.internal] Failed to download Huggingface model zoo index: NLP.QUESTION_ANSWER

Only after it failed, the model will be deployed.

manzke commented 2 months ago
` 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:50,299][INFO ][o.o.m.c.MLSyncUpCron ] [***.compute.internal] Refresh model state: {ZBQEB5ABZYX4tphUswwV=DEPLOYED} a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:42,098][ERROR][o.o.m.a.d.TransportDeployModelOnNodeAction] [***.compute.internal] Deploy model task failed: ZRQKB5ABZYX4tphUpwzG a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) org.opensearch.transport.RemoteTransportException: [.compute.internal][][cluster:admin/opensearch/mlinternal/forward] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) Caused by: java.lang.NullPointerException: Cannot invoke "org.opensearch.ml.task.MLTaskCache.getMlTask()" because "mlTaskCache" is null a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.forward.TransportForwardAction.doExecute(TransportForwardAction.java:121) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.controlcenter.notification.filter.IndexOperationActionFilter.apply(IndexOperationActionFilter.kt:39) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:77) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:102) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:98) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:1053) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService$3.sendRequest(TransportService.java:161) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:989) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestAsync(TransportService.java:1746) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:885) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:844) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.deploy.TransportDeployModelOnNodeAction.lambda$createDeployModelNodeResponse$2(TransportDeployModelOnNodeAction.java:167) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1030) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at java.base/java.lang.Thread.run(Thread.java:840) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:1053) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService$3.sendRequest(TransportService.java:161) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:989) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestAsync(TransportService.java:1746) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:885) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:844) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.deploy.TransportDeployModelOnNodeAction.lambda$createDeployModelNodeResponse$2(TransportDeployModelOnNodeAction.java:167) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1030) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at java.base/java.lang.Thread.run(Thread.java:840) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:42,096][ERROR][o.o.m.a.f.TransportForwardAction] [***.compute.internal] Failed to execute forward action DEPLOY_MODEL_DONE a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) java.lang.NullPointerException: Cannot invoke "org.opensearch.ml.task.MLTaskCache.getMlTask()" because "mlTaskCache" is null a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.forward.TransportForwardAction.doExecute(TransportForwardAction.java:121) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.controlcenter.notification.filter.IndexOperationActionFilter.apply(IndexOperationActionFilter.kt:39) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:77) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:102) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:98) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:42,036][INFO ][o.o.m.e.a.DLModel ] [***.compute.internal] Model ZBQEB5ABZYX4tphUswwV is successfully deployed on 1 devices a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:39,057][INFO ][a.d.p.e.PtEngine ] [***.compute.internal] Number of inter-op threads is 1 a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:39,058][INFO ][a.d.p.e.PtEngine ] [***.compute.internal] Number of intra-op threads is 1 a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch
11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:37,657][WARN ][a.d.h.z.HfModelZoo ] [***.compute.internal] Failed to download Huggingface model zoo index: NLP.TOKEN_CLASSIFICATION a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch

11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:50,299][INFO ][o.o.m.c.MLSyncUpCron ] [.compute.internal] Refresh model state: {ZBQEB5ABZYX4tphUswwV=DEPLOYED} [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:42,098][ERROR][o.o.m.a.d.TransportDeployModelOnNodeAction] [.compute.internal] Deploy model task failed: ZRQKB5ABZYX4tphUpwzG [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) org.opensearch.transport.RemoteTransportException: [.compute.internal][][cluster:admin/opensearch/mlinternal/forward] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) Caused by: java.lang.NullPointerException: Cannot invoke "org.opensearch.ml.task.MLTaskCache.getMlTask()" because "mlTaskCache" is null a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.forward.TransportForwardAction.doExecute(TransportForwardAction.java:121) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.controlcenter.notification.filter.IndexOperationActionFilter.apply(IndexOperationActionFilter.kt:39) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:77) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:102) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:98) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:1053) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService$3.sendRequest(TransportService.java:161) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:989) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestAsync(TransportService.java:1746) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:885) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:844) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.deploy.TransportDeployModelOnNodeAction.lambda$createDeployModelNodeResponse$2(TransportDeployModelOnNodeAction.java:167) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1030) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at java.base/java.lang.Thread.run(Thread.java:840) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendLocalRequest(TransportService.java:1053) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService$3.sendRequest(TransportService.java:161) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestInternal(TransportService.java:989) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequestAsync(TransportService.java:1746) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:885) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.transport.TransportService.sendRequest(TransportService.java:844) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.deploy.TransportDeployModelOnNodeAction.lambda$createDeployModelNodeResponse$2(TransportDeployModelOnNodeAction.java:167) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$6.onResponse(ActionListener.java:301) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$deployModel$51(MLModelManager.java:1030) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) ~[opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.model.MLModelManager.lambda$retrieveModelChunks$72(MLModelManager.java:1553) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.core.action.ActionListener$1.onResponse(ActionListener.java:82) [opensearch-core-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.ThreadedActionListener$1.doRun(ThreadedActionListener.java:78) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:913) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at java.base/java.lang.Thread.run(Thread.java:840) [?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:42,096][ERROR][o.o.m.a.f.TransportForwardAction] [.compute.internal] Failed to execute forward action DEPLOY_MODEL_DONE [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) java.lang.NullPointerException: Cannot invoke "org.opensearch.ml.task.MLTaskCache.getMlTask()" because "mlTaskCache" is null a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.ml.action.forward.TransportForwardAction.doExecute(TransportForwardAction.java:121) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:218) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.actionfilter.FieldCapsFilter.apply(FieldCapsFilter.kt:118) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.controlcenter.notification.filter.IndexOperationActionFilter.apply(IndexOperationActionFilter.kt:39) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.performanceanalyzer.action.PerformanceAnalyzerActionFilter.apply(PerformanceAnalyzerActionFilter.java:77) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction$RequestFilterChain.proceed(TransportAction.java:216) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.TransportAction.execute(TransportAction.java:188) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:102) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.action.support.HandledTransportAction$TransportHandler.messageReceived(HandledTransportAction.java:98) ~[opensearch-2.12.0.jar:2.12.0] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?] a3b0c66178584897bdc5bd8d170235a9 ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:42,036][INFO ][o.o.m.e.a.DLModel ] [.compute.internal] Model ZBQEB5ABZYX4tphUswwV is successfully deployed on 1 devices [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:39,057][INFO ][a.d.p.e.PtEngine ] [.compute.internal] Number of inter-op threads is 1 [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:39,058][INFO ][a.d.p.e.PtEngine ] [.compute.internal] Number of intra-op threads is 1 [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch 11 June 2024 at 13:35 (UTC+2:00) [2024-06-11T11:35:37,657][WARN ][a.d.h.z.HfModelZoo ] [.compute.internal] Failed to download Huggingface model zoo index: NLP.TOKEN_CLASSIFICATION [a3b0c66178584897bdc5bd8d170235a9]() ragtime-opensearch`