opensearch-project / ml-commons

ml-commons provides a set of common machine learning algorithms, e.g. k-means, or linear regression, to help developers build ML related features within OpenSearch.
Apache License 2.0
89 stars 127 forks source link

[BUG] ss4o_metric_template same priority #1201

Closed pdolinic closed 1 year ago

pdolinic commented 1 year ago

What is the bug?

This ss4o metric bug triggers on trying to activate registration, I tried a lot of settings:

allow_registering_model_via_url: true

As soon as I disable this parameter again, it works

-- Logs begin at Tue 2023-08-08 22:00:43 CEST, end at Fri 2023-08-11 19:10:29 CEST. --
Aug 08 22:00:43 graylog systemd[1]: Starting OpenSearch...
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: A terminally deprecated method in java.lang.System has been called
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/opensearch/lib/opensearch-2.9.0.j
ar)
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager will be removed in a future release
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: A terminally deprecated method in java.lang.System has been called
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/opensearch/lib/opensearch-2.9.0.jar
)
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager will be removed in a future release
Aug 08 22:00:54 graylog systemd[1]: Started OpenSearch.
Aug 08 22:00:54 graylog systemd-entrypoint[506]: uncaught exception in thread [main]
Aug 08 22:00:54 graylog systemd-entrypoint[506]: java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics-*-*] matching patterns from exist
ing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics-*-*]) that have the same priority [1], multiple index templates may not match during index creation,
 please use a different priority
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:558)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.metadata.MetadataIndexTemplateService$4.execute(MetadataIndexTemplateService.java:491)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:65)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:874)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:424)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:295)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:206)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(Prioritiz
edOpenSearchThreadPoolExecutor.java:282)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSe
archThreadPoolExecutor.java:245)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at java.base/java.lang.Thread.run(Thread.java:833)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: For complete error details, refer to the log at /var/log/opensearch/opensearch-1.log
Aug 09 00:00:02 graylog systemd-entrypoint[506]: 2023-08-09 00:00:02,303 opensearch[opensearch-1][transport_worker][T#8] ERROR Could not define attribute view on path "/var/log/opensearch/op
ensearch-1_server.json" got access denied ("java.lang.RuntimePermission" "accessUserInformation") java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessUs
erInformation")
Aug 09 00:00:02 graylog systemd-entrypoint[506]:         at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
Aug 09 00:00:02 graylog systemd-entrypoint[506]:         at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
Aug 09 00:00:02 graylog systemd-entrypoint[506]:         at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)
cluster.name: opensearch-1
node.name: opensearch-1
node.roles: [ data, cluster_manager,ml, ml_full_access ]
path.data: /var/lib/opensearch
path.logs: /var/log/opensearch
network.host: 0.0.0.0
discovery.type: single-node

plugins.security.ssl.transport.enabled: true
plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch/opensearch.crt
plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch/opensearch.key
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch/graylog-ca-root.crt
plugins.security.ssl.transport.enforce_hostname_verification: true

plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: /etc/opensearch/opensearch.crt
plugins.security.ssl.http.pemkey_filepath: /etc/opensearch/opensearch.key
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch/graylog-ca-root.crt
plugins.security.allow_default_init_securityindex: true

plugins.security.allow_unsafe_democertificates: true

plugins.security.authcz.admin_dn:
  - "CN=opensearch.lan,OU=xo,O=xo,L=xo,ST=xo,C=xo"
plugins.security.nodes_dn:
  - "CN=opensearch.lan,OU=xo,O=xo,L=xo,ST=xo,C=xo"

plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.cache.ttl_minutes: 60

plugins.security.restapi.roles_enabled: ["all_access", "security_rest_api_access"]

plugins.security.system_indices.enabled: true

opendistro_security.audit.config.disabled_rest_categories: NONE
opendistro_security.audit.config.disabled_transport_categories: NONE

plugins.security.system_indices.indices: [".plugins-ml-model-group", ".plugins-ml-model", ".plugins-ml-task", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models"]

action.auto_create_index: true
allow_registering_model_via_url: true
plugins.security.disabled: false

plugins.ml_commons.only_run_on_ml_node: true

A clear and concise description of what you expected to happen.

What is your host/environment?

NAME="AlmaLinux"
VERSION="8.8 (Sapphire Caracal)"
pdolinic commented 1 year ago

Okay nvm, it seems it takes

plugins.ml_commons.allow_registering_model_via_url: true
pdolinic commented 1 year ago

Okay this still doesn't work

POST /_plugins/_ml/models/_upload
{
  "name": "all-MiniLM-L6-v2",
  "version": "1.0.0",
  "description": "test model",
  "model_format": "TORCH_SCRIPT",
  "model_config": {
    "model_type": "bert",
    "embedding_dimension": 384,
    "framework_type": "sentence_transformers"
  },
  "url": "https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true"
}

Returns


{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "To upload custom model user needs to enable allow_registering_model_via_url settings. Otherwise please use opensearch pre-trained models."
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "To upload custom model user needs to enable allow_registering_model_via_url settings. Otherwise please use opensearch pre-trained models."
  },
  "status": 400
}

With log


-- Logs begin at Tue 2023-08-08 22:00:43 CEST, end at Fri 2023-08-11 19:28:02 CEST. --
Aug 08 22:00:43 graylog systemd[1]: Starting OpenSearch...
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: A terminally deprecated method in java.lang.System has been called
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/opensearch/lib/opensearch-2.9.0.j
ar)
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager will be removed in a future release
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: A terminally deprecated method in java.lang.System has been called
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/opensearch/lib/opensearch-2.9.0.jar
)
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager will be removed in a future release
Aug 08 22:00:54 graylog systemd[1]: Started OpenSearch.
Aug 08 22:00:54 graylog systemd-entrypoint[506]: uncaught exception in thread [main]
Aug 08 22:00:54 graylog systemd-entrypoint[506]: java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics-*-*] matching patterns from exist
ing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics-*-*]) that have the same priority [1], multiple index templates may not match during index creation,
 please use a different priority
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:558)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.metadata.MetadataIndexTemplateService$4.execute(MetadataIndexTemplateService.java:491)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:65)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:874)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:424)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:295)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:206)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(Prioritiz
edOpenSearchThreadPoolExecutor.java:282)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSe
archThreadPoolExecutor.java:245)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
Aug 08 22:00:54 graylog systemd-entrypoint[506]:         at java.base/java.lang.Thread.run(Thread.java:833)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: For complete error details, refer to the log at /var/log/opensearch/opensearch-1.log
Aug 09 00:00:02 graylog systemd-entrypoint[506]: 2023-08-09 00:00:02,303 opensearch[opensearch-1][transport_worker][T#8] ERROR Could not define attribute view on path "/var/log/opensearch/op
ensearch-1_server.json" got access denied ("java.lang.RuntimePermission" "accessUserInformation") java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessUs
erInformation")
Aug 09 00:00:02 graylog systemd-entrypoint[506]:         at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
Aug 09 00:00:02 graylog systemd-entrypoint[506]:         at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
Aug 09 00:00:02 graylog systemd-entrypoint[506]:         at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)
--More--

I made those changes on registration, but it does't seem to take it

plugins.security.system_indices.indices: ["plugins.ml_commons.allow_registering_model_via_url: true", ".plugins-ml-model-group", ".plugins-ml-model", ".plugins-ml-task", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models"]
#node.max_local_storage_nodes: 3
######## End OpenSearch Security Demo Configuration ########

action.auto_create_index: true
allow_registering_model_via_url: true
plugins.security.disabled: false

plugins.ml_commons.only_run_on_ml_node: true
ylwu-amzn commented 1 year ago

Hi, @pdolinic , suggest follow this doc https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md#1-torchscript

ylwu-amzn commented 1 year ago

@pdolinic , Have you tried this doc https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md#1-torchscript? Do you still see any issue?

pdolinic commented 1 year ago

Hey @ylwu-amzn yes i will note everything in detail down for you, i am currently on holiday some days i will reply back to you in some day with all the steps I did.

Thanks for caring!

pdolinic commented 1 year ago

Pre-work 1)

pip install -U sentence-transformers

2) Node-Permissions are set like:

node.name: opensearch-1
node.roles: [ data, cluster_manager, ml_full_access ]

3) The tail -f /var/log/opensearch/opensearch-1.*log told me it was missing Indices and I created it as such:

PUT /.plugins-ml-model
curl -k -XPUT -u "admin2:$PASSWORD" "https://127.0.0.1:9200/.opensearch-sap-correlation-rules-config"

From here going with the Docs from: https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md#1-torchscript I am providing input and outputs:

PUT /_cluster/settings
{
    "persistent" : {
        "plugins.ml_commons.only_run_on_ml_node" : false 
  }
}
PUT _cluster/settings
{
    "persistent" : {
        "plugins.ml_commons.native_memory_threshold" : 100 
  }
}
POST /_plugins/_ml/models/_register
{
    "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
    "version": "1.0.1",
    "model_format": "TORCH_SCRIPT",
    "model_group_id": "8IjOsYgBFp6IJxCceZ2-"
}

I was thinking maybe permissions of user or config, but I am able to create models, why not register them?

plugins.security.ssl.transport.enabled: true
plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch/opensearch.crt
plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch/opensearch.key
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch/graylog-ca-root.crt
plugins.security.ssl.transport.enforce_hostname_verification: true

plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: /etc/opensearch/opensearch.crt
plugins.security.ssl.http.pemkey_filepath: /etc/opensearch/opensearch.key
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch/graylog-ca-root.crt
plugins.security.allow_default_init_securityindex: true

plugins.security.allow_unsafe_democertificates: true

plugins.security.authcz.admin_dn:
  - "CN=opensearch.lan,OU=xo,O=xo,L=xo,ST=xo,C=xo"
plugins.security.nodes_dn:
  - "CN=opensearch.lan,OU=xo,O=xo,L=xo,ST=xo,C=xo"

    #plugins.security.audit.type: internal_opensearch

plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.cache.ttl_minutes: 60

plugins.security.restapi.roles_enabled: ["all_access","ml_full_access", "security_rest_api_access"]

plugins.security.system_indices.enabled: true

opendistro_security.audit.config.disabled_rest_categories: NONE
opendistro_security.audit.config.disabled_transport_categories: NONE

#plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opendistro-notifications-*", ".opendistro-notebooks", ".opendistro-asynchronous-search-response*"]

plugins.security.system_indices.indices: ["plugins.ml_commons.allow_registering_model_via_url: true", ".plugins-ml-model-group", ".plugins-ml-model", ".plugins-ml-task", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models"]
#node.max_local_storage_nodes: 3
######## End OpenSearch Security Demo Configuration ########

action.auto_create_index: true
#allow_registering_model_via_url: true
plugins.security.disabled: false

plugins.ml_commons.only_run_on_ml_node: true
##plugins.ml_commons.task_dispatch_policy: round_robin
#plugins.ml_commons.max_ml_task_per_node: 10
#plugins.ml_commons.max_model_on_node: 10
##plugins.ml_commons.monitoring_request_count: 100
#plugins.ml_commons.max_upload_model_tasks_per_node: 10
#plugins.ml_commons.max_load_model_tasks_per_node: 10
##plugins.ml_commons.sync_up_job_interval_in_seconds: 3

I am tracking the logs now and see Failing Shards?

[2023-08-22T16:54:09,177][ERROR][o.o.s.u.SecurityAnalyticsException] [opensearch-1] Security Analytics error:
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
    at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:665) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:373) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:704) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:473) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:295) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.transport.TransportService$6.handleException(TransportService.java:884) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:379) ~[?:?]
    at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1504) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1618) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1592) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:79) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:71) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.9.0.jar:2.9.0]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
    at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.index.query.QueryShardException: failed to create query: [nested] failed to find nested object under path [correlate]
    at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:482) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:465) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService.parseSource(SearchService.java:1236) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService.createContext(SearchService.java:984) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:592) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:565) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) [opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) [opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
    ... 8 more
Caused by: java.lang.IllegalStateException: [nested] failed to find nested object under path [correlate]
    at org.opensearch.index.query.NestedQueryBuilder.doToQuery(NestedQueryBuilder.java:299) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:117) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.index.query.QueryShardContext.lambda$toQuery$3(QueryShardContext.java:466) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:478) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:465) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService.parseSource(SearchService.java:1236) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService.createContext(SearchService.java:984) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:592) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:565) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) ~[opensearch-2.9.0.jar:2.9.0]
    at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
    ... 8 more
 curl -k -XGET "https://admin:$my_pw@127.0.0.1:9200/_cluster/health"
{"cluster_name":"opensearch-1","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"discovered_master":true,"discovered_cluster_manager":true,"active_primary_shards":77,"active_shards":77,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":61,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":55.79710144927537}

Still the instance is fine, TLS is up everywhere and I get fresh logs, this has to be related to ML I assume: flowing-logs

pdolinic commented 1 year ago

Maybe the longer output of

GET _cluster/settings?include_defaults=true

might help:

{
  "persistent": {
    "plugins": {
      "ml_commons": {
        "only_run_on_ml_node": "false",
        "allow_registering_model_via_url": "true",
        "native_memory_threshold": "100"
      },
      "index_state_management": {
        "metadata_migration": {
          "status": "1"
        },
        "template_migration": {
          "control": "-1"
        }
      }
    }
  },
  "transient": {},
  "defaults": {
    "task_resource_tracking": {
      "enabled": "true"
    },
    "cluster": {
      "max_voting_config_exclusions": "10",
      "metadata": {
        "perf_analyzer": {
          "config": {
            "overrides": ""
          },
          "pa_node_stats_setting": "1",
          "state": "0"
        }
      },
      "no_master_block": "metadata_write",
      "persistent_tasks": {
        "allocation": {
          "enable": "all",
          "recheck_interval": "30s"
        }
      },
      "initial_cluster_manager_nodes": [],
      "remote": {
        "node": {
          "attr": ""
        },
        "initial_connect_timeout": "30s",
        "connect": "true",
        "connections_per_cluster": "3"
      },
      "no_cluster_manager_block": "metadata_write",
      "routing": {
        "rebalance": {
          "enable": "all"
        },
        "allocation": {
          "node_concurrent_incoming_recoveries": "2",
          "move": {
            "primary_first": "false"
          },
          "node_initial_primaries_recoveries": "4",
          "same_shard": {
            "host": "false"
          },
          "total_shards_per_node": "-1",
          "cluster_concurrent_recoveries": "-1",
          "shard_state": {
            "reroute": {
              "priority": "NORMAL"
            }
          },
          "type": "balanced",
          "disk": {
            "threshold_enabled": "true",
            "watermark": {
              "flood_stage": "95%",
              "high": "90%",
              "low": "85%",
              "enable_for_single_data_node": "false"
            },
            "include_relocations": "true",
            "reroute_interval": "60s"
          },
          "node_initial_replicas_recoveries": "4",
          "awareness": {
            "balance": "false",
            "attributes": []
          },
          "balance": {
            "index": "0.55",
            "threshold": "1.0",
            "shard": "0.45",
            "prefer_primary": "false"
          },
          "load_awareness": {
            "allow_unassigned_primaries": "true",
            "flat_skew": "2",
            "skew_factor": "50.0",
            "provisioned_capacity": "-1"
          },
          "enable": "all",
          "node_concurrent_outgoing_recoveries": "2",
          "allow_rebalance": "indices_all_active",
          "cluster_concurrent_rebalance": "2",
          "node_concurrent_recoveries": "2",
          "total_shards_limit": "-1"
        },
        "ignore_weighted_routing": "false",
        "use_adaptive_replica_selection": "true",
        "weighted": {
          "strict": "true",
          "fail_open": "true",
          "default_weight": "1.0"
        }
      },
      "search": {
        "ignore_awareness_attributes": "true"
      },
      "default_number_of_replicas": "1",
      "join": {
        "timeout": "60000ms"
      },
      "info": {
        "update": {
          "interval": "30s",
          "timeout": "15s"
        }
      },
      "auto_shrink_voting_configuration": "true",
      "election": {
        "duration": "500ms",
        "initial_timeout": "100ms",
        "max_timeout": "10s",
        "back_off_time": "100ms",
        "strategy": "default"
      },
      "blocks": {
        "create_index": "false",
        "read_only_allow_delete": "false",
        "read_only": "false",
        "create_index.auto_release": "true"
      },
      "ignore_dot_indexes": "false",
      "follower_lag": {
        "timeout": "90000ms"
      },
      "indices": {
        "replication": {
          "strategy": "DOCUMENT"
        },
        "tombstones": {
          "size": "500"
        },
        "close": {
          "enable": "true"
        }
      },
      "nodes": {
        "reconnect_interval": "10s"
      },
      "task": {
        "consumers": {
          "top_n": {
            "size": "10",
            "frequency": "60s"
          }
        }
      },
      "service": {
        "slow_master_task_logging_threshold": "10s",
        "slow_cluster_manager_task_logging_threshold": "10s",
        "slow_task_logging_threshold": "30s"
      },
      "publish": {
        "timeout": "30000ms",
        "info_timeout": "10000ms"
      },
      "name": "opensearch-1",
      "fault_detection": {
        "leader_check": {
          "interval": "1000ms",
          "timeout": "10000ms",
          "retry_count": "3"
        },
        "follower_check": {
          "interval": "1000ms",
          "timeout": "10000ms",
          "retry_count": "3"
        }
      },
      "max_shards_per_node": "1000",
      "initial_master_nodes": [],
      "snapshot": {
        "info": {
          "max_concurrent_fetches": "5"
        }
      }
    },
    "opendistro": {
      "query": {
        "size_limit": "200"
      },
      "scheduled_jobs": {
        "request_timeout": "10s",
        "sweeper": {
          "backoff_millis": "50ms",
          "period": "5m",
          "page_size": "100"
        },
        "enabled": "true",
        "retry_count": "3"
      },
      "asynchronous_search": {
        "max_wait_for_completion_timeout": "1m",
        "expired": {
          "persisted_response": {
            "cleanup_interval": "30m"
          }
        },
        "max_search_running_time": "12h",
        "persist_search_failures": "false",
        "active": {
          "context": {
            "reaper_interval": "5m"
          }
        },
        "node_concurrent_running_searches": "20",
        "max_keep_alive": "5d"
      },
      "destination": {
        "host": {
          "deny_list": []
        }
      },
      "index_state_management": {
        "coordinator": {
          "backoff_millis": "50ms",
          "sweep_period": "10m",
          "backoff_count": "2"
        },
        "metadata_service": {
          "enabled": "true"
        },
        "restricted_index_pattern": """\.opendistro_security|\.kibana.*|\.opendistro-ism-config""",
        "allow_list": [
          "alias",
          "allocation",
          "close",
          "delete",
          "force_merge",
          "index_priority",
          "notification",
          "open",
          "read_only",
          "read_write",
          "replica_count",
          "rollup",
          "rollover",
          "shrink",
          "snapshot"
        ],
        "template_migration": {
          "control": "0"
        },
        "history": {
          "max_age": "24h",
          "number_of_shards": "1",
          "rollover_retention_period": "30d",
          "rollover_check_period": "8h",
          "max_docs": "2500000",
          "number_of_replicas": "1",
          "enabled": "true"
        },
        "job_interval": "5",
        "metadata_migration": {
          "status": "0"
        },
        "enabled": "true",
        "snapshot": {
          "deny_list": []
        }
      },
      "anomaly_detection": {
        "ad_result_history_rollover_period": "12h",
        "max_anomaly_features": "5",
        "breaker": {
          "enabled": "true"
        },
        "request_timeout": "10s",
        "backoff_initial_delay": "1000ms",
        "batch_task_piece_size": "1000",
        "max_cache_miss_handling_per_second": "100",
        "enabled": "true",
        "max_batch_task_per_node": "10",
        "cooldown_minutes": "5m",
        "model_max_size_percent": "0.1",
        "max_primary_shards": "10",
        "ad_result_history_max_docs": "250000000",
        "ad_result_history_retention_period": "30d",
        "backoff_minutes": "15m",
        "detection_window_delay": "0m",
        "index_pressure_soft_limit": "0.8",
        "max_entities_for_preview": "30",
        "max_multi_entity_anomaly_detectors": "10",
        "max_entities_per_query": "1000",
        "max_retry_for_unresponsive_node": "5",
        "detection_interval": "10m",
        "batch_task_piece_interval_seconds": "5",
        "max_old_ad_task_docs_per_detector": "1",
        "max_retry_for_backoff": "3",
        "max_anomaly_detectors": "1000",
        "filter_by_backend_roles": "false"
      },
      "ppl": {
        "enabled": "true",
        "query": {
          "memory_limit": "85%"
        }
      },
      "alerting": {
        "alert_backoff_millis": "50ms",
        "index_timeout": "60s",
        "move_alerts_backoff_count": "3",
        "alert_history_max_age": "30d",
        "request_timeout": "10s",
        "bulk_timeout": "120s",
        "destination": {
          "allow_list": [
            "chime",
            "slack",
            "custom_webhook",
            "email",
            "test_action"
          ]
        },
        "monitor": {
          "max_monitors": "1000"
        },
        "action_throttle_max_value": "24h",
        "alert_history_rollover_period": "12h",
        "alert_history_max_docs": "1000",
        "alert_backoff_count": "2",
        "move_alerts_backoff_millis": "250ms",
        "alert_history_retention_period": "60d",
        "alert_history_enabled": "true",
        "input_timeout": "30s",
        "filter_by_backend_roles": "false"
      },
      "jobscheduler": {
        "jitter_limit": "0.6",
        "request_timeout": "10s",
        "sweeper": {
          "backoff_millis": "50ms",
          "period": "5m",
          "page_size": "100"
        },
        "threadpool": {
          "queue_size": "200",
          "size": "8"
        },
        "retry_count": "3"
      },
      "rollup": {
        "search": {
          "backoff_millis": "1000ms",
          "backoff_count": "5",
          "enabled": "true"
        },
        "dashboards": {
          "enabled": "true"
        },
        "enabled": "true",
        "ingest": {
          "backoff_millis": "1000ms",
          "backoff_count": "5"
        }
      },
      "sql": {
        "cursor": {
          "enabled": "true",
          "fetch_size": "1000",
          "keep_alive": "1m"
        },
        "metrics": {
          "rollinginterval": "60",
          "rollingwindow": "3600"
        },
        "engine": {
          "new": {
            "enabled": "true"
          }
        },
        "enabled": "true",
        "query": {
          "analysis": {
            "semantic": {
              "threshold": "200",
              "suggestion": "false"
            },
            "enabled": "false"
          },
          "slowlog": "2",
          "response": {
            "format": "jdbc"
          }
        }
      }
    },
    "plugins": {
      "replication": {
        "leader": {
          "thread_pool": {
            "queue_size": "1000",
            "size": "0"
          }
        },
        "autofollow": {
          "concurrent_replication_jobs_trigger_size": "3",
          "fetch_poll_interval": "30s",
          "retry_poll_interval": "1h"
        },
        "follower": {
          "poll_interval": "50ms",
          "concurrent_readers_per_shard": "2",
          "concurrent_writers_per_shard": "2",
          "index": {
            "ops_batch_size": "50000",
            "recovery": {
              "chunk_size": "10mb",
              "max_concurrent_file_chunks": "5"
            }
          },
          "block": {
            "start": "false"
          },
          "retention_lease_max_failure_duration": "1h",
          "metadata_sync_interval": "60s"
        }
      },
      "security_config": {
        "ssl_dual_mode_enabled": "false"
      },
      "query": {
        "memory_limit": "85%",
        "metrics": {
          "rolling_interval": "60",
          "rolling_window": "3600"
        },
        "datasources": {
          "uri": {
            "allowhosts": ".*"
          }
        },
        "size_limit": "200"
      },
      "scheduled_jobs": {
        "request_timeout": "10s",
        "sweeper": {
          "backoff_millis": "50ms",
          "period": "5m",
          "page_size": "100"
        },
        "enabled": "true",
        "retry_count": "3"
      },
      "asynchronous_search": {
        "max_wait_for_completion_timeout": "1m",
        "expired": {
          "persisted_response": {
            "cleanup_interval": "30m"
          }
        },
        "max_search_running_time": "12h",
        "persist_search_failures": "false",
        "active": {
          "context": {
            "reaper_interval": "5m"
          }
        },
        "node_concurrent_running_searches": "20",
        "max_keep_alive": "5d"
      },
      "destination": {
        "host": {
          "deny_list": []
        }
      },
      "index_state_management": {
        "coordinator": {
          "backoff_millis": "50ms",
          "sweep_period": "10m",
          "sweep_skip_period": "5m",
          "backoff_count": "2"
        },
        "jitter": "0.6",
        "metadata_service": {
          "enabled": "true"
        },
        "restricted_index_pattern": """\.opendistro_security|\.kibana.*|\.opendistro-ism-config""",
        "action_validation": {
          "enabled": "false"
        },
        "allow_list": [
          "alias",
          "allocation",
          "close",
          "delete",
          "force_merge",
          "index_priority",
          "notification",
          "open",
          "read_only",
          "read_write",
          "replica_count",
          "rollup",
          "rollover",
          "shrink",
          "snapshot"
        ],
        "history": {
          "max_age": "24h",
          "number_of_shards": "1",
          "rollover_retention_period": "30d",
          "rollover_check_period": "8h",
          "max_docs": "2500000",
          "number_of_replicas": "1",
          "enabled": "true"
        },
        "job_interval": "5",
        "enabled": "true",
        "snapshot": {
          "deny_list": []
        }
      },
      "security_analytics": {
        "index_timeout": "60s",
        "alert_history_max_age": "30d",
        "request_timeout": "10s",
        "alert_finding_max_docs": "1000",
        "alert_finding_rollover_period": "12h",
        "finding_history_max_age": "30d",
        "correlation_time_window": "5m",
        "action_throttle_max_value": "24h",
        "alert_history_rollover_period": "12h",
        "mappings": {
          "default_schema": "ecs"
        },
        "alert_history_max_docs": "1000",
        "alert_finding_enabled": "true",
        "finding_history_retention_period": "60d",
        "alert_history_retention_period": "60d",
        "alert_history_enabled": "true",
        "filter_by_backend_roles": "false"
      },
      "snapshot_management": {
        "filter_by_backend_roles": "false"
      },
      "ppl": {
        "enabled": "true"
      },
      "alerting": {
        "alert_backoff_millis": "50ms",
        "index_timeout": "60s",
        "move_alerts_backoff_count": "3",
        "alert_history_max_age": "30d",
        "request_timeout": "10s",
        "alert_finding_max_docs": "1000",
        "bulk_timeout": "120s",
        "destination": {
          "allow_list": [
            "chime",
            "slack",
            "custom_webhook",
            "email",
            "test_action"
          ]
        },
        "alert_finding_rollover_period": "12h",
        "finding_history_max_age": "30d",
        "monitor": {
          "max_monitors": "1000"
        },
        "max_actionable_alert_count": "50",
        "action_throttle_max_value": "24h",
        "alert_history_rollover_period": "12h",
        "alert_history_max_docs": "1000",
        "alert_finding_enabled": "true",
        "alert_backoff_count": "2",
        "finding_history_retention_period": "60d",
        "move_alerts_backoff_millis": "250ms",
        "alert_history_retention_period": "60d",
        "alert_history_enabled": "true",
        "input_timeout": "30s",
        "filter_by_backend_roles": "false"
      },
      "rollup": {
        "search": {
          "backoff_millis": "1000ms",
          "search_all_jobs": "false",
          "backoff_count": "5",
          "enabled": "true"
        },
        "dashboards": {
          "enabled": "true"
        },
        "enabled": "true",
        "ingest": {
          "backoff_millis": "1000ms",
          "backoff_count": "5"
        }
      },
      "sql": {
        "cursor": {
          "keep_alive": "1m"
        },
        "slowlog": "2",
        "delete": {
          "enabled": "false"
        },
        "enabled": "true"
      },
      "ml_commons": {
        "monitoring_request_count": "100",
        "jvm_heap_memory_threshold": "85",
        "allow_custom_deployment_plan": "false",
        "sync_up_job_interval_in_seconds": "10",
        "max_register_model_tasks_per_node": "10",
        "ml_task_timeout_in_seconds": "600",
        "allow_registering_model_via_local_file": "false",
        "trusted_url_regex": "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]",
        "task_dispatch_policy": "round_robin",
        "max_model_on_node": "10",
        "max_ml_task_per_node": "10",
        "exclude_nodes": {
          "_name": ""
        },
        "trusted_connector_endpoints_regex": [
          """^https://runtime\.sagemaker\..*[a-z0-9-]\.amazonaws\.com/.*$""",
          """^https://api\.openai\.com/.*$""",
          """^https://api\.cohere\.ai/.*$"""
        ],
        "model_access_control_enabled": "false",
        "connector_access_control_enabled": "false",
        "enable_inhouse_python_model": "false",
        "max_deploy_model_tasks_per_node": "10",
        "model_auto_redeploy": {
          "lifetime_retry_times": "3",
          "enable": "false"
        }
      },
      "transform": {
        "circuit_breaker": {
          "jvm": {
            "threshold": "85"
          },
          "enabled": "true"
        },
        "internal": {
          "index": {
            "backoff_millis": "1000ms",
            "backoff_count": "5"
          },
          "search": {
            "backoff_millis": "1000ms",
            "backoff_count": "5"
          }
        }
      },
      "index_management": {
        "filter_by_backend_roles": "false"
      },
      "anomaly_detection": {
        "entity_cold_start_queue_max_heap_percent": "0.001",
        "max_anomaly_features": "5",
        "breaker": {
          "enabled": "true"
        },
        "request_timeout": "10s",
        "checkpoint_read_queue_max_heap_percent": "0.001",
        "max_batch_task_per_node": "10",
        "checkpoint_read_queue_batch_size": "25",
        "max_top_entities_for_historical_analysis": "1000",
        "cooldown_minutes": "5m",
        "expected_cold_entity_execution_time_in_millisecs": "3000",
        "model_max_size_percent": "0.1",
        "max_running_entities_per_detector_for_historical_analysis": "10",
        "door_keeper_in_cache": {
          "enabled": "false"
        },
        "page_size": "1000",
        "checkpoint_read_queue_concurrency": "1",
        "index_pressure_soft_limit": "0.6",
        "max_multi_entity_anomaly_detectors": "10",
        "max_entities_per_query": "1000000",
        "checkpoint_saving_freq": "12h",
        "delete_anomaly_result_when_delete_detector": "false",
        "max_concurrent_preview": "2",
        "max_cached_deleted_tasks": "1000",
        "max_retry_for_unresponsive_node": "5",
        "entity_cold_start_queue_concurrency": "1",
        "ad_result_history_max_docs_per_shard": "1350000000",
        "batch_task_piece_interval_seconds": "5",
        "checkpoint_maintain_queue_max_heap_percent": "0.001",
        "dedicated_cache_size": "10",
        "filter_by_backend_roles": "false",
        "ad_result_history_rollover_period": "12h",
        "checkpoint_write_queue_concurrency": "2",
        "hcad_cold_start_interpolation": {
          "enabled": "false"
        },
        "backoff_initial_delay": "1000ms",
        "batch_task_piece_size": "1000",
        "checkpoint_ttl": "7d",
        "enabled": "true",
        "category_field_limit": "2",
        "checkpoint_write_queue_batch_size": "25",
        "result_write_queue_batch_size": "5000",
        "expected_checkpoint_maintain_time_in_millisecs": "1000",
        "max_primary_shards": "10",
        "result_write_queue_concurrency": "2",
        "result_write_queue_max_heap_percent": "0.01",
        "cold_entity_queue_max_heap_percent": "0.001",
        "ad_result_history_retention_period": "30d",
        "backoff_minutes": "15m",
        "detection_window_delay": "0m",
        "checkpoint_write_queue_max_heap_percent": "0.01",
        "max_entities_for_preview": "5",
        "index_pressure_hard_limit": "0.9",
        "max_model_size_per_node": "100",
        "detection_interval": "10m",
        "max_old_ad_task_docs_per_detector": "1",
        "max_retry_for_backoff": "3",
        "max_anomaly_detectors": "1000"
      },
      "jobscheduler": {
        "jitter_limit": "0.6",
        "request_timeout": "10s",
        "sweeper": {
          "backoff_millis": "50ms",
          "period": "5m",
          "page_size": "100"
        },
        "retry_count": "3"
      }
    },
    "logger": {
      "level": "INFO"
    },
    "processors": "8",
    "ingest": {
      "user_agent": {
        "cache_size": "1000"
      },
      "geoip": {
        "cache_size": "1000"
      },
      "grok": {
        "watchdog": {
          "max_execution_time": "1s",
          "interval": "1s"
        }
      }
    },
    "pidfile": "",
    "path": {
      "data": [
        "/var/lib/opensearch"
      ],
      "logs": "/var/log/opensearch",
      "shared_data": "",
      "home": "/usr/share/opensearch",
      "repo": []
    },
    "repositories": {
      "fs": {
        "compress": "false",
        "chunk_size": "9223372036854775807b",
        "location": ""
      },
      "url": {
        "supported_protocols": [
          "http",
          "https",
          "ftp",
          "file",
          "jar"
        ],
        "allowed_urls": [],
        "url": "http:"
      }
    },
    "action": {
      "auto_create_index": "true",
      "search": {
        "shard_count": {
          "limit": "9223372036854775807"
        }
      },
      "destructive_requires_name": "false"
    },
    "opensearch_dashboards": {
      "system_indices": [
        ".opensearch_dashboards",
        ".opensearch_dashboards_*",
        ".reporting-*",
        ".apm-agent-configuration",
        ".apm-custom-link"
      ]
    },
    "cache": {
      "recycler": {
        "page": {
          "limit": {
            "heap": "10%"
          },
          "type": "CONCURRENT",
          "weight": {
            "longs": "1.0",
            "ints": "1.0",
            "bytes": "1.0",
            "objects": "0.1"
          }
        }
      }
    },
    "point_in_time": {
      "init": {
        "keep_alive": "30s"
      },
      "max_keep_alive": "24h"
    },
    "reindex": {
      "remote": {
        "allowlist": [],
        "whitelist": []
      }
    },
    "resource": {
      "reload": {
        "enabled": "true",
        "interval": {
          "low": "60s",
          "high": "5s",
          "medium": "30s"
        }
      }
    },
    "thread_pool": {
      "force_merge": {
        "queue_size": "-1",
        "size": "1"
      },
      "fetch_shard_started": {
        "core": "1",
        "max": "16",
        "keep_alive": "5m"
      },
      "listener": {
        "queue_size": "-1",
        "size": "4"
      },
      "refresh": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "remote_refresh": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "translog_sync": {
        "queue_size": "10000",
        "size": "32"
      },
      "system_write": {
        "queue_size": "1000",
        "size": "4"
      },
      "generic": {
        "core": "4",
        "max": "128",
        "keep_alive": "30s"
      },
      "warmer": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "remote_purge": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "translog_transfer": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "ml_commons": {
        "opensearch_ml_deploy": {
          "queue_size": "10",
          "size": "7"
        },
        "opensearch_ml_execute": {
          "queue_size": "10",
          "size": "7"
        },
        "opensearch_ml_register": {
          "queue_size": "10",
          "size": "7"
        },
        "opensearch_ml_train": {
          "queue_size": "10",
          "size": "7"
        },
        "opensearch_ml_predict": {
          "queue_size": "10000",
          "size": "16"
        },
        "opensearch_ml_general": {
          "queue_size": "100",
          "size": "7"
        }
      },
      "search": {
        "max_queue_size": "1000",
        "queue_size": "1000",
        "size": "13",
        "auto_queue_frame_size": "2000",
        "target_response_time": "1s",
        "min_queue_size": "1000"
      },
      "opensearch_asynchronous_search_generic": {
        "core": "1",
        "max": "16",
        "keep_alive": "30m"
      },
      "fetch_shard_store": {
        "core": "1",
        "max": "16",
        "keep_alive": "5m"
      },
      "flush": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "management": {
        "core": "1",
        "max": "5",
        "keep_alive": "5m"
      },
      "analyze": {
        "queue_size": "16",
        "size": "1"
      },
      "get": {
        "queue_size": "1000",
        "size": "8"
      },
      "system_read": {
        "queue_size": "2000",
        "size": "4"
      },
      "estimated_time_interval": "200ms",
      "write": {
        "queue_size": "10000",
        "size": "8"
      },
      "snapshot": {
        "core": "1",
        "max": "4",
        "keep_alive": "5m"
      },
      "search_throttled": {
        "max_queue_size": "100",
        "queue_size": "100",
        "size": "1",
        "auto_queue_frame_size": "200",
        "target_response_time": "1s",
        "min_queue_size": "100"
      }
    },
    "index": {
      "codec": "default",
      "recovery": {
        "type": ""
      },
      "store": {
        "hybrid": {
          "mmap": {
            "extensions": [
              "nvd",
              "dvd",
              "tim",
              "tip",
              "dim",
              "kdd",
              "kdi",
              "cfs",
              "doc",
              "vec",
              "vex"
            ]
          }
        },
        "type": "",
        "fs": {
          "fs_lock": "native"
        },
        "preload": []
      }
    },
    "replication_leader": {
      "queue_size": "1000",
      "size": "13"
    },
    "task_cancellation": {
      "duration_millis": "10000",
      "enabled": "true"
    },
    "script": {
      "allowed_contexts": [],
      "max_compilations_rate": "use-context",
      "cache": {
        "max_size": "100",
        "expire": "0ms"
      },
      "painless": {
        "regex": {
          "enabled": "limited",
          "limit-factor": "6"
        }
      },
      "max_size_in_bytes": "65535",
      "allowed_types": [],
      "disable_max_compilations_rate": "false"
    },
    "indexing_pressure": {
      "memory": {
        "limit": "10%"
      }
    },
    "node": {
      "data": "true",
      "roles": [
        "data",
        "cluster_manager",
        "ml",
        "ml_full_access"
      ],
      "max_local_storage_nodes": "1",
      "processors": "8",
      "store": {
        "allow_mmap": "true"
      },
      "ingest": "true",
      "master": "true",
      "pidfile": "/var/run/opensearch/opensearch.pid",
      "search": {
        "cache": {
          "size": "0b"
        }
      },
      "remote_cluster_client": "true",
      "enable_lucene_segment_infos_trace": "false",
      "local_storage": "true",
      "name": "opensearch-1",
      "id": {
        "seed": "0"
      },
      "attr": {
        "shard_indexing_pressure_enabled": "true"
      },
      "portsfile": "false"
    },
    "null": {
      "queue_size": "1000",
      "size": "8"
    },
    "http": {
      "cors": {
        "max-age": "1728000",
        "allow-origin": "",
        "allow-headers": "X-Requested-With,Content-Type,Content-Length",
        "allow-credentials": "false",
        "allow-methods": "OPTIONS,HEAD,GET,POST,PUT,DELETE",
        "enabled": "false"
      },
      "max_chunk_size": "8kb",
      "compression_level": "3",
      "max_initial_line_length": "4kb",
      "type": "org.opensearch.security.http.SecurityHttpServerTransport",
      "pipelining": {
        "max_events": "10000"
      },
      "type.default": "netty4",
      "content_type": {
        "required": "true"
      },
      "host": [],
      "publish_port": "-1",
      "read_timeout": "0ms",
      "max_content_length": "100mb",
      "netty": {
        "receive_predictor_size": "64kb",
        "max_composite_buffer_components": "69905",
        "worker_count": "0"
      },
      "tcp": {
        "reuse_address": "true",
        "keep_count": "-1",
        "keep_interval": "-1",
        "no_delay": "true",
        "keep_alive": "true",
        "receive_buffer_size": "-1b",
        "keep_idle": "-1",
        "send_buffer_size": "-1b"
      },
      "bind_host": [],
      "reset_cookies": "false",
      "max_warning_header_count": "-1",
      "tracer": {
        "include": [],
        "exclude": []
      },
      "max_warning_header_size": "-1b",
      "detailed_errors": {
        "enabled": "true"
      },
      "port": "9200-9300",
      "max_header_size": "8kb",
      "tcp_no_delay": "true",
      "compression": "false",
      "publish_host": []
    },
    "compatibility": {
      "override_main_response_version": "false"
    },
    "snapshot": {
      "max_concurrent_operations": "1000"
    },
    "bootstrap": {
      "memory_lock": "false",
      "system_call_filter": "true",
      "ctrlhandler": "true"
    },
    "network": {
      "host": [
        "0.0.0.0"
      ],
      "tcp": {
        "reuse_address": "true",
        "keep_count": "-1",
        "connect_timeout": "30s",
        "keep_interval": "-1",
        "no_delay": "true",
        "keep_alive": "true",
        "receive_buffer_size": "-1b",
        "keep_idle": "-1",
        "send_buffer_size": "-1b"
      },
      "bind_host": [
        "0.0.0.0"
      ],
      "server": "true",
      "breaker": {
        "inflight_requests": {
          "limit": "100%",
          "overhead": "2.0"
        }
      },
      "publish_host": [
        "0.0.0.0"
      ]
    },
    "search": {
      "default_search_timeout": "-1",
      "highlight": {
        "term_vector_multi_value": "true"
      },
      "max_open_pit_context": "300",
      "cancel_after_time_interval": "-1",
      "default_allow_partial_results": "true",
      "max_open_scroll_context": "500",
      "max_buckets": "65535",
      "low_level_cancellation": "true",
      "allow_expensive_queries": "true",
      "keep_alive_interval": "1m",
      "default_keep_alive": "5m",
      "max_keep_alive": "24h"
    },
    "security": {
      "manager": {
        "filter_bad_defaults": "true"
      }
    },
    "segrep": {
      "pressure": {
        "checkpoint": {
          "limit": "4"
        },
        "time": {
          "limit": "5m"
        },
        "replica": {
          "stale": {
            "limit": "0.5"
          }
        },
        "enabled": "false"
      }
    },
    "client": {
      "type": "node"
    },
    "opendistro_security_config": {
      "ssl_dual_mode_enabled": "false"
    },
    "rest": {
      "action": {
        "multi": {
          "allow_explicit_index": "true"
        }
      }
    },
    "remote_store": {
      "segment": {
        "pressure": {
          "bytes_lag": {
            "variance_factor": "10.0"
          },
          "upload_bytes_moving_average_window_size": "20",
          "upload_bytes_per_sec_moving_average_window_size": "20",
          "time_lag": {
            "variance_factor": "10.0"
          },
          "upload_time_moving_average_window_size": "20",
          "consecutive_failures": {
            "limit": "5"
          },
          "enabled": "false"
        }
      }
    },
    "replication_follower": {
      "core": "1",
      "max": "10",
      "keep_alive": "1m"
    },
    "knn": {
      "algo_param": {
        "index_thread_qty": "1"
      },
      "cache": {
        "item": {
          "expiry": {
            "enabled": "false",
            "minutes": "3h"
          }
        }
      },
      "memory": {
        "circuit_breaker": {
          "limit": "50%",
          "enabled": "true"
        }
      },
      "plugin": {
        "enabled": "true"
      },
      "queue_size": "1",
      "size": "1",
      "circuit_breaker": {
        "unset": {
          "percentage": "75.0"
        },
        "triggered": "false"
      },
      "model": {
        "index": {
          "number_of_shards": "1",
          "number_of_replicas": "1"
        },
        "cache": {
          "size": {
            "limit": "10%"
          }
        }
      }
    },
    "monitor": {
      "jvm": {
        "gc": {
          "enabled": "true",
          "overhead": {
            "warn": "50",
            "debug": "10",
            "info": "25"
          },
          "refresh_interval": "1s"
        },
        "refresh_interval": "1s"
      },
      "process": {
        "refresh_interval": "1s"
      },
      "os": {
        "refresh_interval": "1s"
      },
      "fs": {
        "health": {
          "healthy_timeout_threshold": "60s",
          "refresh_interval": "60s",
          "enabled": "true",
          "slow_path_logging_threshold": "5s"
        },
        "refresh_interval": "1s"
      }
    },
    "transport": {
      "tcp": {
        "reuse_address": "true",
        "keep_count": "-1",
        "connect_timeout": "30s",
        "keep_interval": "-1",
        "compress": "false",
        "port": "9300-9400",
        "no_delay": "true",
        "keep_alive": "true",
        "receive_buffer_size": "-1b",
        "keep_idle": "-1",
        "send_buffer_size": "-1b"
      },
      "bind_host": [],
      "connect_timeout": "30s",
      "compress": "false",
      "ping_schedule": "-1",
      "connections_per_node": {
        "recovery": "2",
        "state": "1",
        "bulk": "3",
        "reg": "6",
        "ping": "1"
      },
      "tracer": {
        "include": [],
        "exclude": [
          "internal:coordination/fault_detection/*",
          "cluster:monitor/nodes/liveness"
        ]
      },
      "type": "org.opensearch.security.ssl.http.netty.SecuritySSLNettyTransport",
      "slow_operation_logging_threshold": "5s",
      "type.default": "netty4",
      "port": "9300-9400",
      "host": [],
      "publish_port": "-1",
      "tcp_no_delay": "true",
      "publish_host": [],
      "netty": {
        "receive_predictor_size": "64kb",
        "receive_predictor_max": "64kb",
        "worker_count": "8",
        "receive_predictor_min": "64kb",
        "boss_count": "1"
      }
    },
    "task_resource_consumers": {
      "enabled": "false"
    },
    "cluster_manager": {
      "throttling": {
        "retry": {
          "max": {
            "delay": "30s"
          },
          "base": {
            "delay": "5s"
          }
        }
      }
    },
    "indices": {
      "replication": {
        "retry_timeout": "60s",
        "initial_retry_backoff_bound": "50ms"
      },
      "cache": {
        "cleanup_interval": "1m"
      },
      "mapping": {
        "dynamic_timeout": "30s",
        "max_in_flight_updates": "10"
      },
      "memory": {
        "interval": "5s",
        "max_index_buffer_size": "-1",
        "shard_inactive_time": "5m",
        "index_buffer_size": "10%",
        "min_index_buffer_size": "48mb"
      },
      "breaker": {
        "request": {
          "limit": "60%",
          "type": "memory",
          "overhead": "1.0"
        },
        "total": {
          "limit": "95%",
          "use_real_memory": "true"
        },
        "fielddata": {
          "limit": "40%",
          "type": "memory",
          "overhead": "1.03"
        },
        "type": "hierarchy"
      },
      "query": {
        "bool": {
          "max_clause_count": "1024"
        },
        "query_string": {
          "analyze_wildcard": "false",
          "allowLeadingWildcard": "true"
        }
      },
      "id_field_data": {
        "enabled": "true"
      },
      "recovery": {
        "recovery_activity_timeout": "1800000ms",
        "retry_delay_network": "5s",
        "internal_action_timeout": "15m",
        "retry_delay_state_sync": "500ms",
        "internal_action_long_timeout": "1800000ms",
        "max_concurrent_operations": "1",
        "max_bytes_per_sec": "40mb",
        "max_concurrent_file_chunks": "2"
      },
      "requests": {
        "cache": {
          "size": "1%",
          "expire": "0ms"
        }
      },
      "store": {
        "delete": {
          "shard": {
            "timeout": "30s"
          }
        }
      },
      "analysis": {
        "hunspell": {
          "dictionary": {
            "ignore_case": "false",
            "lazy": "false"
          }
        }
      },
      "queries": {
        "cache": {
          "count": "10000",
          "size": "10%",
          "all_segments": "false"
        }
      },
      "fielddata": {
        "cache": {
          "size": "-1b"
        }
      }
    },
    "plugin": {
      "mandatory": []
    },
    "opensearch": {
      "reports": {
        "general": {
          "operationTimeoutMs": "60000",
          "defaultItemsQueryCount": "100"
        }
      },
      "experimental": {
        "feature": {
          "concurrent_segment_search": {
            "enabled": "false"
          },
          "extensions": {
            "enabled": "false"
          },
          "telemetry": {
            "enabled": "false"
          },
          "remote_store": {
            "enabled": "false"
          },
          "segment_replication_experimental": {
            "enabled": "false"
          },
          "identity": {
            "enabled": "false"
          }
        }
      },
      "ad": {
        "ad-threadpool": {
          "core": "1",
          "max": "4",
          "keep_alive": "10m"
        },
        "ad-batch-task-threadpool": {
          "core": "1",
          "max": "1",
          "keep_alive": "10m"
        }
      },
      "observability": {
        "general": {
          "operationTimeoutMs": "60000",
          "defaultItemsQueryCount": "1000"
        },
        "access": {
          "filterBy": "NoFilter",
          "ignoreRoles": [
            "own_index",
            "opensearch_dashboards_user",
            "notebooks_full_access",
            "notebooks_read_access"
          ],
          "adminAccess": "AllObservabilityObjects"
        },
        "polling": {
          "maxLockRetries": "4",
          "jobLockDurationSeconds": "300",
          "maxPollingDurationSeconds": "900",
          "minPollingDurationSeconds": "300"
        }
      },
      "notifications": {
        "core": {
          "allowed_config_types": [
            "slack",
            "chime",
            "webhook",
            "email",
            "sns",
            "ses_account",
            "smtp_account",
            "email_group"
          ],
          "tooltip_support": "true",
          "http": {
            "socket_timeout": "50000",
            "host_deny_list": [],
            "max_connections": "60",
            "connection_timeout": "5000",
            "max_connection_per_route": "20"
          },
          "email": {
            "minimum_header_length": "160",
            "size_limit": "10000000"
          }
        },
        "general": {
          "default_items_query_count": "100",
          "operation_timeout_ms": "60000",
          "filter_by_backend_roles": "false"
        }
      }
    },
    "discovery": {
      "seed_hosts": [],
      "unconfigured_bootstrap_timeout": "3s",
      "request_peers_timeout": "3000ms",
      "zen": {
        "hosts_provider": [],
        "ping": {
          "unicast": {
            "concurrent_connects": "10",
            "hosts": [],
            "hosts.resolve_timeout": "5s"
          }
        }
      },
      "initial_state_timeout": "30s",
      "cluster_formation_warning_timeout": "10000ms",
      "seed_providers": [],
      "find_peers_interval_during_decommission": "120s",
      "type": "single-node",
      "seed_resolver": {
        "max_concurrent_resolvers": "10",
        "timeout": "5s"
      },
      "find_peers_interval": "1000ms",
      "probe": {
        "connect_timeout": "3000ms",
        "handshake_timeout": "1000ms"
      }
    },
    "search_backpressure": {
      "mode": "monitor_only",
      "cancellation_burst": "10.0",
      "cancellation_ratio": "0.1",
      "cancellation_rate": "0.003",
      "search_task": {
        "elapsed_time_millis_threshold": "45000",
        "heap_variance": "2.0",
        "heap_percent_threshold": "0.02",
        "cancellation_burst": "5.0",
        "cpu_time_millis_threshold": "30000",
        "cancellation_ratio": "0.1",
        "cancellation_rate": "0.003",
        "total_heap_percent_threshold": "0.05",
        "heap_moving_average_window_size": "100"
      },
      "node_duress": {
        "cpu_threshold": "0.9",
        "heap_threshold": "0.7",
        "num_successive_breaches": "3"
      },
      "search_shard_task": {
        "elapsed_time_millis_threshold": "30000",
        "heap_variance": "2.0",
        "heap_percent_threshold": "0.005",
        "cancellation_burst": "10.0",
        "cpu_time_millis_threshold": "15000",
        "cancellation_ratio": "0.1",
        "cancellation_rate": "0.003",
        "total_heap_percent_threshold": "0.05",
        "heap_moving_average_window_size": "100"
      }
    },
    "shard_indexing_pressure": {
      "primary_parameter": {
        "node": {
          "soft_limit": "0.7"
        },
        "shard": {
          "min_limit": "0.001"
        }
      },
      "enforced": "false",
      "secondary_parameter": {
        "successful_request": {
          "max_outstanding_requests": "100",
          "elapsed_timeout": "300000ms"
        },
        "throughput": {
          "request_size_window": "2000",
          "degradation_factor": "5.0"
        }
      },
      "cache_store": {
        "max_size": "200"
      },
      "enabled": "false",
      "operating_factor": {
        "optimal": "0.85",
        "lower": "0.75",
        "upper": "0.95"
      }
    },
    "gateway": {
      "recover_after_master_nodes": "0",
      "expected_nodes": "-1",
      "recover_after_data_nodes": "-1",
      "expected_data_nodes": "-1",
      "write_dangling_indices_info": "true",
      "slow_write_logging_threshold": "10s",
      "recover_after_time": "0ms",
      "expected_master_nodes": "-1",
      "recover_after_nodes": "-1",
      "auto_import_dangling_indices": "false"
    }
  }
}
ylwu-amzn commented 1 year ago

@pdolinic , I see some problems

  1. The node.roles is not correct.
    node.roles: [ data, cluster_manager, ml_full_access ]

    If you want to add ML role to node, you should not use ml_full_access, just change to ml , refer to https://opensearch.org/docs/latest/ml-commons-plugin/index/#ml-node, so you should use node.roles: [ data, cluster_manager, ml ]

ml_full_access is a "permission" role for security, not "node" role

  1. Wrong model group id I see you create a model group with
    POST /_plugins/_ml/model_groups/_register
    {
    "name": "test_model_group_public-b",
    "description": "This is a public model group"
    }

    And the model group created, model group id is Fd9aHYoBh74vBCV4b8BC

Later you upload model with a different model group id 8IjOsYgBFp6IJxCceZ2-, I guess that model group doesn't exist, so you see "error": "Model group not found",

POST /_plugins/_ml/models/_register
{
    "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
    "version": "1.0.1",
    "model_format": "TORCH_SCRIPT",
    "model_group_id": "8IjOsYgBFp6IJxCceZ2-"
}

I see you are trying to get task status with GET /_plugins/_ml/tasks/8IjOsYgBFp6IJxCceZ2-, so I'm confused is 8IjOsYgBFp6IJxCceZ2- task id or model group id?

pdolinic commented 1 year ago

Thanks I got confused by the Model Group, and sorted that out now, assiging to the correct group and managing a correct registeration I could progress:

POST /_plugins/_ml/models/_register { "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2", "version": "1.0.1", "model_format": "TORCH_SCRIPT", "model_group_id": "G8QV5okBRMEPLlaZOe87" }

returns

{ "task_id": "APhKHooBNrcgC_kYjn7q", "status": "CREATED" }

I also added

PUT /_cluster/settings { "persistent": { "plugins.ml_commons.sync_up_job_interval_in_seconds": 600 } }

Problem: When I want to check the task via:

GET /_plugins/_ml/tasks/APhKHooBNrcgC_kYjn7q
{
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "FAILED",
  "worker_node": [
    "KmfjhwWjS7eepv4PnUMKWw"
  ],
  "create_time": 1692725317352,
  "last_update_time": 1692725318453,
  "error": """Cannot invoke "org.opensearch.cluster.metadata.MappingMetadata.getSourceAsMap()" because the return value of "org.opensearch.cluster.metadata.IndexMetadata.mapping()" is null""",
  "is_async": true
}

Update: From here on at Intervals I can see Opensearch-Dashboards for a milisecond refresh with something that could be this model, but then then entirely disappear again entirely.

pdolinic commented 1 year ago

I found a similar issue here with that mapping-Error I am running into : https://github.com/opensearch-project/security-analytics/issues/305

This seems index related, I removed some ones from Graylog / Icinga that are certainly unreleated, here are the others, maybe one of them is causing this:

GET /_cat/indices?v
health status index                                       uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   .plugins-ml-model-group                     jGAUj3TMSr6eol5PKivEjQ   1   1         16            3     50.9kb         50.9kb
green  open   .ql-datasources                             xjXYSmGmS5equteZ6yO97g   1   0          0            0       208b           208b
yellow open   .plugins-ml-task                            iHmP8ebKQVerz5a7FVDq8A   1   1         47            1       39kb           39kb
green  open   .opendistro-reports-definitions             wvK55M5-TIyL7EuUYGoduw   1   0          2            0      9.7kb          9.7kb
green  open   .opendistro_security                        bwOxmf4FQpyOxQ8FGj-dQw   1   0         10            0       48kb           48kb
green  open   .opendistro-reports-instances               _oHIjzIVTjaxdaKQhlXORw   1   0          2            0       12kb           12kb
yellow open   sample-host-health                          YF9jbVsHREa3DfpIiv8Mww   1   1      40320            0      1.2mb          1.2mb
green  open   .opensearch-observability                   c21YwgVPTPu7Hm5M9NVQuQ   1   0          0            0       208b           208b
yellow open   .plugins-ml-model                           L3XnckAoTkeqLXDDDd0Wzw   1   1          0            0       208b           208b
yellow open   icingabeat-7.17.4                           yIW7P6bSRuyAWkeewBbmQg   3   2      71884            0     25.9mb         25.9mb
green  open   graylog_1                                   Y37EYeiDSLSer_H9gtYHng   4   0    2343293            0        1gb            1gb
green  open   graylog_0                                   awS-XplgSDmR_55iJco5iw   4   0   20543833            0     10.6gb         10.6gb
green  open   gl-system-events_0                          rXiu44P5R8CQY_1uht--lQ   4   0          0            0       832b           832b
green  open   opensearch_dashboards_sample_data_ecommerce 7RBFKsh_QCu-5KRF4KaGIQ   1   0       4675            0        4mb            4mb
green  open   gl-events_0                                 me8p9BUTQjuNoRH_5nta0w   4   0          0            0       832b           832b
green  open   .kibana_2                                   YO-Csy4RRhK0PmNpzCneAQ   1   0         35            3    102.7kb        102.7kb
green  open   .kibana_1                                   ab9zq7flRDKf_L3yMXmbtg   1   0         23            4     57.1kb         57.1kb
yellow open   .plugins-ml-config                          IAp6XfvpQzG-UyJk6ewVRg   1   1          1            0      3.9kb          3.9kb
yellow open   security-auditlog-2023.07.14                V0rV_bNySv2RPzKbHA3uuA   1   1     121185            0     73.2mb         73.2mb
yellow open   security-auditlog-2023.07.13                U3s4BVL2R72QQ4SW7PpMgQ   1   1       6135            0      9.6mb          9.6mb
yellow open   security-auditlog-2023.07.12                woBSKAcBSgOLKlQwZGIkwA   1   1         72            0      1.1mb          1.1mb
yellow open   .opendistro-job-scheduler-lock              kuSgOfMHSRW8ZxXFDjfNLg   1   1          3            0      5.8kb          5.8kb
green  open   .opensearch-notifications-config            sTzC4WdzSHuHyFIfSI5ONA   1   0          0            0       208b           208b
pdolinic commented 1 year ago

Okay looks like it seems to work, still not seeing it in dashboards but:

1) deleted deleted the opensearch-ml plugins, 2) deleted all opensearch-dashboards plugins entirely 3) deleted all those indices

DELETE /.plugins-ml-model-group
DELETE /.plugins-ml-task
DELETE /.plugins-ml-model
DELETE /.plugins-ml-config 
DELETE /.opensearch-sap-correlation-rules-config
DELETE /sample-host-health

4) reinstalled opensearch 5) reinstalled all opensearch-dashboards-plugins 6) redid model group creation, upload, registration into group, and now and now I am getting a complete back!

GET /_plugins/_ml/tasks/MprLIYoBmWoi9V5-o5Ix
---
# SUCCESS
---

{
  "model_id": "RJrLIYoBmWoi9V5-o5LW",
  "task_type": "REGISTER_MODEL",
  "function_name": "TEXT_EMBEDDING",
  "state": "COMPLETED",
  "worker_node": [
    "KmfjhwWjS7eepv4PnUMKWw"
  ],
  "create_time": 1692784108336,
  "last_update_time": 1692784119285,
  "is_async": true
}
pdolinic commented 1 year ago

Got it working, thanks a lot for the pointers! I'll do an OpenSearch Blog series and this will be one part of it. modell_loaded

ylwu-amzn commented 1 year ago

Cool, glad to see it works!

@pdolinic , please share the blog link here to help more people!