Closed pdolinic closed 1 year ago
Okay nvm, it seems it takes
plugins.ml_commons.allow_registering_model_via_url: true
Okay this still doesn't work
POST /_plugins/_ml/models/_upload
{
"name": "all-MiniLM-L6-v2",
"version": "1.0.0",
"description": "test model",
"model_format": "TORCH_SCRIPT",
"model_config": {
"model_type": "bert",
"embedding_dimension": 384,
"framework_type": "sentence_transformers"
},
"url": "https://github.com/opensearch-project/ml-commons/raw/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true"
}
Returns
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "To upload custom model user needs to enable allow_registering_model_via_url settings. Otherwise please use opensearch pre-trained models."
}
],
"type": "illegal_argument_exception",
"reason": "To upload custom model user needs to enable allow_registering_model_via_url settings. Otherwise please use opensearch pre-trained models."
},
"status": 400
}
With log
-- Logs begin at Tue 2023-08-08 22:00:43 CEST, end at Fri 2023-08-11 19:28:02 CEST. --
Aug 08 22:00:43 graylog systemd[1]: Starting OpenSearch...
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: A terminally deprecated method in java.lang.System has been called
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.OpenSearch (file:/usr/share/opensearch/lib/opensearch-2.9.0.j
ar)
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.OpenSearch
Aug 08 22:00:45 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager will be removed in a future release
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: A terminally deprecated method in java.lang.System has been called
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager has been called by org.opensearch.bootstrap.Security (file:/usr/share/opensearch/lib/opensearch-2.9.0.jar
)
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: Please consider reporting this to the maintainers of org.opensearch.bootstrap.Security
Aug 08 22:00:46 graylog systemd-entrypoint[506]: WARNING: System::setSecurityManager will be removed in a future release
Aug 08 22:00:54 graylog systemd[1]: Started OpenSearch.
Aug 08 22:00:54 graylog systemd-entrypoint[506]: uncaught exception in thread [main]
Aug 08 22:00:54 graylog systemd-entrypoint[506]: java.lang.IllegalArgumentException: index template [ss4o_metrics_template] has index patterns [ss4o_metrics-*-*] matching patterns from exist
ing templates [ss4o_metric_template] with patterns (ss4o_metric_template => [ss4o_metrics-*-*]) that have the same priority [1], multiple index templates may not match during index creation,
please use a different priority
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:558)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.metadata.MetadataIndexTemplateService$4.execute(MetadataIndexTemplateService.java:491)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:65)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.service.MasterService.executeTasks(MasterService.java:874)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:424)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.service.MasterService.runTasks(MasterService.java:295)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.service.MasterService$Batcher.run(MasterService.java:206)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:204)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:242)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:849)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(Prioritiz
edOpenSearchThreadPoolExecutor.java:282)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at org.opensearch.common.util.concurrent.PrioritizedOpenSearchThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedOpenSe
archThreadPoolExecutor.java:245)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: at java.base/java.lang.Thread.run(Thread.java:833)
Aug 08 22:00:54 graylog systemd-entrypoint[506]: For complete error details, refer to the log at /var/log/opensearch/opensearch-1.log
Aug 09 00:00:02 graylog systemd-entrypoint[506]: 2023-08-09 00:00:02,303 opensearch[opensearch-1][transport_worker][T#8] ERROR Could not define attribute view on path "/var/log/opensearch/op
ensearch-1_server.json" got access denied ("java.lang.RuntimePermission" "accessUserInformation") java.security.AccessControlException: access denied ("java.lang.RuntimePermission" "accessUs
erInformation")
Aug 09 00:00:02 graylog systemd-entrypoint[506]: at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:485)
Aug 09 00:00:02 graylog systemd-entrypoint[506]: at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
Aug 09 00:00:02 graylog systemd-entrypoint[506]: at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)
--More--
I made those changes on registration, but it does't seem to take it
plugins.security.system_indices.indices: ["plugins.ml_commons.allow_registering_model_via_url: true", ".plugins-ml-model-group", ".plugins-ml-model", ".plugins-ml-task", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models"]
#node.max_local_storage_nodes: 3
######## End OpenSearch Security Demo Configuration ########
action.auto_create_index: true
allow_registering_model_via_url: true
plugins.security.disabled: false
plugins.ml_commons.only_run_on_ml_node: true
Hi, @pdolinic , suggest follow this doc https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md#1-torchscript
@pdolinic , Have you tried this doc https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md#1-torchscript? Do you still see any issue?
Hey @ylwu-amzn yes i will note everything in detail down for you, i am currently on holiday some days i will reply back to you in some day with all the steps I did.
Thanks for caring!
Pre-work 1)
pip install -U sentence-transformers
2) Node-Permissions are set like:
node.name: opensearch-1
node.roles: [ data, cluster_manager, ml_full_access ]
3) The tail -f /var/log/opensearch/opensearch-1.*log
told me it was missing Indices and I created it as such:
PUT /.plugins-ml-model
curl -k -XPUT -u "admin2:$PASSWORD" "https://127.0.0.1:9200/.opensearch-sap-correlation-rules-config"
From here going with the Docs from: https://github.com/opensearch-project/ml-commons/blob/2.x/docs/model_serving_framework/text_embedding_model_examples.md#1-torchscript I am providing input and outputs:
PUT /_cluster/settings
{
"persistent" : {
"plugins.ml_commons.only_run_on_ml_node" : false
}
}
{
"acknowledged": true,
"persistent": {
"plugins": {
"ml_commons": {
"only_run_on_ml_node": "false"
}
}
},
"transient": {}
}
PUT _cluster/settings
{
"persistent" : {
"plugins.ml_commons.native_memory_threshold" : 100
}
}
{
"acknowledged": true,
"persistent": {
"plugins": {
"ml_commons": {
"native_memory_threshold": "100"
}
}
},
"transient": {}
}
POST /_plugins/_ml/model_groups/_register
{
"name": "test_model_group_public-b",
"description": "This is a public model group"
}
{
"model_group_id": "Fd9aHYoBh74vBCV4b8BC",
"status": "CREATED"
}
POST /_plugins/_ml/models/_register
{
"name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT",
"model_group_id": "8IjOsYgBFp6IJxCceZ2-"
}
{
"task_id": "HOiAHYoBh74vBCV4syRa",
"status": "CREATED"
}
GET /_plugins/_ml/tasks/HOiAHYoBh74vBCV4syRa
GET /_plugins/_ml/tasks/8IjOsYgBFp6IJxCceZ2-
{ "error": { "root_cause": [ { "type": "status_exception", "reason": "Fail to find task" } ], "type": "status_exception", "reason": "Fail to find task" }, "status": 404 }
---
PUT _cluster/settings
{
"persistent" : {
"plugins.ml_commons.allow_registering_model_via_url" : true
}
}
-returns
{
"acknowledged": true,
"persistent": {
"plugins": {
"ml_commons": {
"allow_registering_model_via_url": "true"
}
}
},
"transient": {}
}
---
Trying URL-Upload:
POST /_plugins/_ml/models/_register { "name": "sentence-transformers/all-MiniLM-L6-v2", "version": "1.0.1", "description": "This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.", "model_task_type": "TEXT_EMBEDDING", "model_format": "TORCH_SCRIPT", "model_content_hash_value": "c15f0d2e62d872be5b5bc6c84d2e0f4921541e29fefbef51d59cc10a8ae30e0f", "model_config": { "model_type": "bert", "embedding_dimension": 384, "framework_type": "sentence_transformers", "all_config": "{\"_name_or_path\":\"nreimers/MiniLM-L6-H384-uncased\",\"architectures\":[\"BertModel\"],\"attention_probs_dropout_prob\":0.1,\"gradient_checkpointing\":false,\"hidden_act\":\"gelu\",\"hidden_dropout_prob\":0.1,\"hidden_size\":384,\"initializer_range\":0.02,\"intermediate_size\":1536,\"layer_norm_eps\":1e-12,\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6,\"pad_token_id\":0,\"position_embedding_type\":\"absolute\",\"transformers_version\":\"4.8.2\",\"type_vocab_size\":2,\"use_cache\":true,\"vocab_size\":30522}" }, "model_group_id": "7IjOsYgBFp6IJxCceZ1-", "url": "https://artifacts.opensearch.org/models/ml-models/huggingface/sentence-transformers/all-MiniLM-L6-v2/1.0.1/torch_script/sentence-transformers_all-MiniLM-L6-v2-1.0.1-torch_script.zip" }
{
"task_id": "ae-fHYoBh74vBCV4pgQF",
"status": "CREATED"
}
POST /_plugins/_ml/models/ae-fHYoBh74vBCV4pgQF/_load
{
"error": {
"root_cause": [
{
"type": "status_exception",
"reason": "Failed to find model"
}
],
"type": "status_exception",
"reason": "Failed to find model"
},
"status": 404
}
POST /_plugins/_ml/models/7IjOsYgBFp6IJxCceZ1-/_load
{ "error": { "root_cause": [ { "type": "status_exception", "reason": "Failed to find model" } ], "type": "status_exception", "reason": "Failed to find model" }, "status": 404 }
I was thinking maybe permissions of user or config, but I am able to create models, why not register them?
plugins.security.ssl.transport.enabled: true
plugins.security.ssl.transport.pemcert_filepath: /etc/opensearch/opensearch.crt
plugins.security.ssl.transport.pemkey_filepath: /etc/opensearch/opensearch.key
plugins.security.ssl.transport.pemtrustedcas_filepath: /etc/opensearch/graylog-ca-root.crt
plugins.security.ssl.transport.enforce_hostname_verification: true
plugins.security.ssl.http.enabled: true
plugins.security.ssl.http.pemcert_filepath: /etc/opensearch/opensearch.crt
plugins.security.ssl.http.pemkey_filepath: /etc/opensearch/opensearch.key
plugins.security.ssl.http.pemtrustedcas_filepath: /etc/opensearch/graylog-ca-root.crt
plugins.security.allow_default_init_securityindex: true
plugins.security.allow_unsafe_democertificates: true
plugins.security.authcz.admin_dn:
- "CN=opensearch.lan,OU=xo,O=xo,L=xo,ST=xo,C=xo"
plugins.security.nodes_dn:
- "CN=opensearch.lan,OU=xo,O=xo,L=xo,ST=xo,C=xo"
#plugins.security.audit.type: internal_opensearch
plugins.security.enable_snapshot_restore_privilege: true
plugins.security.check_snapshot_restore_write_privileges: true
plugins.security.cache.ttl_minutes: 60
plugins.security.restapi.roles_enabled: ["all_access","ml_full_access", "security_rest_api_access"]
plugins.security.system_indices.enabled: true
opendistro_security.audit.config.disabled_rest_categories: NONE
opendistro_security.audit.config.disabled_transport_categories: NONE
#plugins.security.system_indices.indices: [".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opendistro-notifications-*", ".opendistro-notebooks", ".opendistro-asynchronous-search-response*"]
plugins.security.system_indices.indices: ["plugins.ml_commons.allow_registering_model_via_url: true", ".plugins-ml-model-group", ".plugins-ml-model", ".plugins-ml-task", ".opendistro-alerting-config", ".opendistro-alerting-alert*", ".opendistro-anomaly-results*", ".opendistro-anomaly-detector*", ".opendistro-anomaly-checkpoints", ".opendistro-anomaly-detection-state", ".opendistro-reports-*", ".opensearch-notifications-*", ".opensearch-notebooks", ".opensearch-observability", ".ql-datasources", ".opendistro-asynchronous-search-response*", ".replication-metadata-store", ".opensearch-knn-models"]
#node.max_local_storage_nodes: 3
######## End OpenSearch Security Demo Configuration ########
action.auto_create_index: true
#allow_registering_model_via_url: true
plugins.security.disabled: false
plugins.ml_commons.only_run_on_ml_node: true
##plugins.ml_commons.task_dispatch_policy: round_robin
#plugins.ml_commons.max_ml_task_per_node: 10
#plugins.ml_commons.max_model_on_node: 10
##plugins.ml_commons.monitoring_request_count: 100
#plugins.ml_commons.max_upload_model_tasks_per_node: 10
#plugins.ml_commons.max_load_model_tasks_per_node: 10
##plugins.ml_commons.sync_up_job_interval_in_seconds: 3
I am tracking the logs now and see Failing Shards?
[2023-08-22T16:54:09,177][ERROR][o.o.s.u.SecurityAnalyticsException] [opensearch-1] Security Analytics error:
org.opensearch.action.search.SearchPhaseExecutionException: all shards failed
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:665) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:373) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:704) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.search.AbstractSearchAsyncAction.onShardFailure(AbstractSearchAsyncAction.java:473) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.search.AbstractSearchAsyncAction$1.onFailure(AbstractSearchAsyncAction.java:295) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.search.SearchExecutionStatsCollector.onFailure(SearchExecutionStatsCollector.java:104) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:74) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.search.SearchTransportService$ConnectionCountingHandler.handleException(SearchTransportService.java:755) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.transport.TransportService$6.handleException(TransportService.java:884) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.security.transport.SecurityInterceptor$RestoringTransportResponseHandler.handleException(SecurityInterceptor.java:379) ~[?:?]
at org.opensearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1504) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.transport.TransportService$DirectResponseChannel.processException(TransportService.java:1618) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1592) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:79) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.transport.TransportChannel.sendErrorResponse(TransportChannel.java:71) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.support.ChannelActionListener.onFailure(ChannelActionListener.java:70) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.ActionRunnable.onFailure(ActionRunnable.java:103) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:54) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:908) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) [opensearch-2.9.0.jar:2.9.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.lang.Thread.run(Thread.java:833) [?:?]
Caused by: org.opensearch.index.query.QueryShardException: failed to create query: [nested] failed to find nested object under path [correlate]
at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:482) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:465) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService.parseSource(SearchService.java:1236) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService.createContext(SearchService.java:984) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:592) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:565) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) [opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
... 8 more
Caused by: java.lang.IllegalStateException: [nested] failed to find nested object under path [correlate]
at org.opensearch.index.query.NestedQueryBuilder.doToQuery(NestedQueryBuilder.java:299) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.index.query.AbstractQueryBuilder.toQuery(AbstractQueryBuilder.java:117) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.index.query.QueryShardContext.lambda$toQuery$3(QueryShardContext.java:466) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:478) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.index.query.QueryShardContext.toQuery(QueryShardContext.java:465) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService.parseSource(SearchService.java:1236) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService.createContext(SearchService.java:984) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService.executeQueryPhase(SearchService.java:592) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.search.SearchService$2.lambda$onResponse$0(SearchService.java:565) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88) ~[opensearch-2.9.0.jar:2.9.0]
at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.9.0.jar:2.9.0]
... 8 more
curl -k -XGET "https://admin:$my_pw@127.0.0.1:9200/_cluster/health"
{"cluster_name":"opensearch-1","status":"yellow","timed_out":false,"number_of_nodes":1,"number_of_data_nodes":1,"discovered_master":true,"discovered_cluster_manager":true,"active_primary_shards":77,"active_shards":77,"relocating_shards":0,"initializing_shards":0,"unassigned_shards":61,"delayed_unassigned_shards":0,"number_of_pending_tasks":0,"number_of_in_flight_fetch":0,"task_max_waiting_in_queue_millis":0,"active_shards_percent_as_number":55.79710144927537}
Still the instance is fine, TLS is up everywhere and I get fresh logs, this has to be related to ML I assume:
Maybe the longer output of
GET _cluster/settings?include_defaults=true
might help:
{
"persistent": {
"plugins": {
"ml_commons": {
"only_run_on_ml_node": "false",
"allow_registering_model_via_url": "true",
"native_memory_threshold": "100"
},
"index_state_management": {
"metadata_migration": {
"status": "1"
},
"template_migration": {
"control": "-1"
}
}
}
},
"transient": {},
"defaults": {
"task_resource_tracking": {
"enabled": "true"
},
"cluster": {
"max_voting_config_exclusions": "10",
"metadata": {
"perf_analyzer": {
"config": {
"overrides": ""
},
"pa_node_stats_setting": "1",
"state": "0"
}
},
"no_master_block": "metadata_write",
"persistent_tasks": {
"allocation": {
"enable": "all",
"recheck_interval": "30s"
}
},
"initial_cluster_manager_nodes": [],
"remote": {
"node": {
"attr": ""
},
"initial_connect_timeout": "30s",
"connect": "true",
"connections_per_cluster": "3"
},
"no_cluster_manager_block": "metadata_write",
"routing": {
"rebalance": {
"enable": "all"
},
"allocation": {
"node_concurrent_incoming_recoveries": "2",
"move": {
"primary_first": "false"
},
"node_initial_primaries_recoveries": "4",
"same_shard": {
"host": "false"
},
"total_shards_per_node": "-1",
"cluster_concurrent_recoveries": "-1",
"shard_state": {
"reroute": {
"priority": "NORMAL"
}
},
"type": "balanced",
"disk": {
"threshold_enabled": "true",
"watermark": {
"flood_stage": "95%",
"high": "90%",
"low": "85%",
"enable_for_single_data_node": "false"
},
"include_relocations": "true",
"reroute_interval": "60s"
},
"node_initial_replicas_recoveries": "4",
"awareness": {
"balance": "false",
"attributes": []
},
"balance": {
"index": "0.55",
"threshold": "1.0",
"shard": "0.45",
"prefer_primary": "false"
},
"load_awareness": {
"allow_unassigned_primaries": "true",
"flat_skew": "2",
"skew_factor": "50.0",
"provisioned_capacity": "-1"
},
"enable": "all",
"node_concurrent_outgoing_recoveries": "2",
"allow_rebalance": "indices_all_active",
"cluster_concurrent_rebalance": "2",
"node_concurrent_recoveries": "2",
"total_shards_limit": "-1"
},
"ignore_weighted_routing": "false",
"use_adaptive_replica_selection": "true",
"weighted": {
"strict": "true",
"fail_open": "true",
"default_weight": "1.0"
}
},
"search": {
"ignore_awareness_attributes": "true"
},
"default_number_of_replicas": "1",
"join": {
"timeout": "60000ms"
},
"info": {
"update": {
"interval": "30s",
"timeout": "15s"
}
},
"auto_shrink_voting_configuration": "true",
"election": {
"duration": "500ms",
"initial_timeout": "100ms",
"max_timeout": "10s",
"back_off_time": "100ms",
"strategy": "default"
},
"blocks": {
"create_index": "false",
"read_only_allow_delete": "false",
"read_only": "false",
"create_index.auto_release": "true"
},
"ignore_dot_indexes": "false",
"follower_lag": {
"timeout": "90000ms"
},
"indices": {
"replication": {
"strategy": "DOCUMENT"
},
"tombstones": {
"size": "500"
},
"close": {
"enable": "true"
}
},
"nodes": {
"reconnect_interval": "10s"
},
"task": {
"consumers": {
"top_n": {
"size": "10",
"frequency": "60s"
}
}
},
"service": {
"slow_master_task_logging_threshold": "10s",
"slow_cluster_manager_task_logging_threshold": "10s",
"slow_task_logging_threshold": "30s"
},
"publish": {
"timeout": "30000ms",
"info_timeout": "10000ms"
},
"name": "opensearch-1",
"fault_detection": {
"leader_check": {
"interval": "1000ms",
"timeout": "10000ms",
"retry_count": "3"
},
"follower_check": {
"interval": "1000ms",
"timeout": "10000ms",
"retry_count": "3"
}
},
"max_shards_per_node": "1000",
"initial_master_nodes": [],
"snapshot": {
"info": {
"max_concurrent_fetches": "5"
}
}
},
"opendistro": {
"query": {
"size_limit": "200"
},
"scheduled_jobs": {
"request_timeout": "10s",
"sweeper": {
"backoff_millis": "50ms",
"period": "5m",
"page_size": "100"
},
"enabled": "true",
"retry_count": "3"
},
"asynchronous_search": {
"max_wait_for_completion_timeout": "1m",
"expired": {
"persisted_response": {
"cleanup_interval": "30m"
}
},
"max_search_running_time": "12h",
"persist_search_failures": "false",
"active": {
"context": {
"reaper_interval": "5m"
}
},
"node_concurrent_running_searches": "20",
"max_keep_alive": "5d"
},
"destination": {
"host": {
"deny_list": []
}
},
"index_state_management": {
"coordinator": {
"backoff_millis": "50ms",
"sweep_period": "10m",
"backoff_count": "2"
},
"metadata_service": {
"enabled": "true"
},
"restricted_index_pattern": """\.opendistro_security|\.kibana.*|\.opendistro-ism-config""",
"allow_list": [
"alias",
"allocation",
"close",
"delete",
"force_merge",
"index_priority",
"notification",
"open",
"read_only",
"read_write",
"replica_count",
"rollup",
"rollover",
"shrink",
"snapshot"
],
"template_migration": {
"control": "0"
},
"history": {
"max_age": "24h",
"number_of_shards": "1",
"rollover_retention_period": "30d",
"rollover_check_period": "8h",
"max_docs": "2500000",
"number_of_replicas": "1",
"enabled": "true"
},
"job_interval": "5",
"metadata_migration": {
"status": "0"
},
"enabled": "true",
"snapshot": {
"deny_list": []
}
},
"anomaly_detection": {
"ad_result_history_rollover_period": "12h",
"max_anomaly_features": "5",
"breaker": {
"enabled": "true"
},
"request_timeout": "10s",
"backoff_initial_delay": "1000ms",
"batch_task_piece_size": "1000",
"max_cache_miss_handling_per_second": "100",
"enabled": "true",
"max_batch_task_per_node": "10",
"cooldown_minutes": "5m",
"model_max_size_percent": "0.1",
"max_primary_shards": "10",
"ad_result_history_max_docs": "250000000",
"ad_result_history_retention_period": "30d",
"backoff_minutes": "15m",
"detection_window_delay": "0m",
"index_pressure_soft_limit": "0.8",
"max_entities_for_preview": "30",
"max_multi_entity_anomaly_detectors": "10",
"max_entities_per_query": "1000",
"max_retry_for_unresponsive_node": "5",
"detection_interval": "10m",
"batch_task_piece_interval_seconds": "5",
"max_old_ad_task_docs_per_detector": "1",
"max_retry_for_backoff": "3",
"max_anomaly_detectors": "1000",
"filter_by_backend_roles": "false"
},
"ppl": {
"enabled": "true",
"query": {
"memory_limit": "85%"
}
},
"alerting": {
"alert_backoff_millis": "50ms",
"index_timeout": "60s",
"move_alerts_backoff_count": "3",
"alert_history_max_age": "30d",
"request_timeout": "10s",
"bulk_timeout": "120s",
"destination": {
"allow_list": [
"chime",
"slack",
"custom_webhook",
"email",
"test_action"
]
},
"monitor": {
"max_monitors": "1000"
},
"action_throttle_max_value": "24h",
"alert_history_rollover_period": "12h",
"alert_history_max_docs": "1000",
"alert_backoff_count": "2",
"move_alerts_backoff_millis": "250ms",
"alert_history_retention_period": "60d",
"alert_history_enabled": "true",
"input_timeout": "30s",
"filter_by_backend_roles": "false"
},
"jobscheduler": {
"jitter_limit": "0.6",
"request_timeout": "10s",
"sweeper": {
"backoff_millis": "50ms",
"period": "5m",
"page_size": "100"
},
"threadpool": {
"queue_size": "200",
"size": "8"
},
"retry_count": "3"
},
"rollup": {
"search": {
"backoff_millis": "1000ms",
"backoff_count": "5",
"enabled": "true"
},
"dashboards": {
"enabled": "true"
},
"enabled": "true",
"ingest": {
"backoff_millis": "1000ms",
"backoff_count": "5"
}
},
"sql": {
"cursor": {
"enabled": "true",
"fetch_size": "1000",
"keep_alive": "1m"
},
"metrics": {
"rollinginterval": "60",
"rollingwindow": "3600"
},
"engine": {
"new": {
"enabled": "true"
}
},
"enabled": "true",
"query": {
"analysis": {
"semantic": {
"threshold": "200",
"suggestion": "false"
},
"enabled": "false"
},
"slowlog": "2",
"response": {
"format": "jdbc"
}
}
}
},
"plugins": {
"replication": {
"leader": {
"thread_pool": {
"queue_size": "1000",
"size": "0"
}
},
"autofollow": {
"concurrent_replication_jobs_trigger_size": "3",
"fetch_poll_interval": "30s",
"retry_poll_interval": "1h"
},
"follower": {
"poll_interval": "50ms",
"concurrent_readers_per_shard": "2",
"concurrent_writers_per_shard": "2",
"index": {
"ops_batch_size": "50000",
"recovery": {
"chunk_size": "10mb",
"max_concurrent_file_chunks": "5"
}
},
"block": {
"start": "false"
},
"retention_lease_max_failure_duration": "1h",
"metadata_sync_interval": "60s"
}
},
"security_config": {
"ssl_dual_mode_enabled": "false"
},
"query": {
"memory_limit": "85%",
"metrics": {
"rolling_interval": "60",
"rolling_window": "3600"
},
"datasources": {
"uri": {
"allowhosts": ".*"
}
},
"size_limit": "200"
},
"scheduled_jobs": {
"request_timeout": "10s",
"sweeper": {
"backoff_millis": "50ms",
"period": "5m",
"page_size": "100"
},
"enabled": "true",
"retry_count": "3"
},
"asynchronous_search": {
"max_wait_for_completion_timeout": "1m",
"expired": {
"persisted_response": {
"cleanup_interval": "30m"
}
},
"max_search_running_time": "12h",
"persist_search_failures": "false",
"active": {
"context": {
"reaper_interval": "5m"
}
},
"node_concurrent_running_searches": "20",
"max_keep_alive": "5d"
},
"destination": {
"host": {
"deny_list": []
}
},
"index_state_management": {
"coordinator": {
"backoff_millis": "50ms",
"sweep_period": "10m",
"sweep_skip_period": "5m",
"backoff_count": "2"
},
"jitter": "0.6",
"metadata_service": {
"enabled": "true"
},
"restricted_index_pattern": """\.opendistro_security|\.kibana.*|\.opendistro-ism-config""",
"action_validation": {
"enabled": "false"
},
"allow_list": [
"alias",
"allocation",
"close",
"delete",
"force_merge",
"index_priority",
"notification",
"open",
"read_only",
"read_write",
"replica_count",
"rollup",
"rollover",
"shrink",
"snapshot"
],
"history": {
"max_age": "24h",
"number_of_shards": "1",
"rollover_retention_period": "30d",
"rollover_check_period": "8h",
"max_docs": "2500000",
"number_of_replicas": "1",
"enabled": "true"
},
"job_interval": "5",
"enabled": "true",
"snapshot": {
"deny_list": []
}
},
"security_analytics": {
"index_timeout": "60s",
"alert_history_max_age": "30d",
"request_timeout": "10s",
"alert_finding_max_docs": "1000",
"alert_finding_rollover_period": "12h",
"finding_history_max_age": "30d",
"correlation_time_window": "5m",
"action_throttle_max_value": "24h",
"alert_history_rollover_period": "12h",
"mappings": {
"default_schema": "ecs"
},
"alert_history_max_docs": "1000",
"alert_finding_enabled": "true",
"finding_history_retention_period": "60d",
"alert_history_retention_period": "60d",
"alert_history_enabled": "true",
"filter_by_backend_roles": "false"
},
"snapshot_management": {
"filter_by_backend_roles": "false"
},
"ppl": {
"enabled": "true"
},
"alerting": {
"alert_backoff_millis": "50ms",
"index_timeout": "60s",
"move_alerts_backoff_count": "3",
"alert_history_max_age": "30d",
"request_timeout": "10s",
"alert_finding_max_docs": "1000",
"bulk_timeout": "120s",
"destination": {
"allow_list": [
"chime",
"slack",
"custom_webhook",
"email",
"test_action"
]
},
"alert_finding_rollover_period": "12h",
"finding_history_max_age": "30d",
"monitor": {
"max_monitors": "1000"
},
"max_actionable_alert_count": "50",
"action_throttle_max_value": "24h",
"alert_history_rollover_period": "12h",
"alert_history_max_docs": "1000",
"alert_finding_enabled": "true",
"alert_backoff_count": "2",
"finding_history_retention_period": "60d",
"move_alerts_backoff_millis": "250ms",
"alert_history_retention_period": "60d",
"alert_history_enabled": "true",
"input_timeout": "30s",
"filter_by_backend_roles": "false"
},
"rollup": {
"search": {
"backoff_millis": "1000ms",
"search_all_jobs": "false",
"backoff_count": "5",
"enabled": "true"
},
"dashboards": {
"enabled": "true"
},
"enabled": "true",
"ingest": {
"backoff_millis": "1000ms",
"backoff_count": "5"
}
},
"sql": {
"cursor": {
"keep_alive": "1m"
},
"slowlog": "2",
"delete": {
"enabled": "false"
},
"enabled": "true"
},
"ml_commons": {
"monitoring_request_count": "100",
"jvm_heap_memory_threshold": "85",
"allow_custom_deployment_plan": "false",
"sync_up_job_interval_in_seconds": "10",
"max_register_model_tasks_per_node": "10",
"ml_task_timeout_in_seconds": "600",
"allow_registering_model_via_local_file": "false",
"trusted_url_regex": "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]",
"task_dispatch_policy": "round_robin",
"max_model_on_node": "10",
"max_ml_task_per_node": "10",
"exclude_nodes": {
"_name": ""
},
"trusted_connector_endpoints_regex": [
"""^https://runtime\.sagemaker\..*[a-z0-9-]\.amazonaws\.com/.*$""",
"""^https://api\.openai\.com/.*$""",
"""^https://api\.cohere\.ai/.*$"""
],
"model_access_control_enabled": "false",
"connector_access_control_enabled": "false",
"enable_inhouse_python_model": "false",
"max_deploy_model_tasks_per_node": "10",
"model_auto_redeploy": {
"lifetime_retry_times": "3",
"enable": "false"
}
},
"transform": {
"circuit_breaker": {
"jvm": {
"threshold": "85"
},
"enabled": "true"
},
"internal": {
"index": {
"backoff_millis": "1000ms",
"backoff_count": "5"
},
"search": {
"backoff_millis": "1000ms",
"backoff_count": "5"
}
}
},
"index_management": {
"filter_by_backend_roles": "false"
},
"anomaly_detection": {
"entity_cold_start_queue_max_heap_percent": "0.001",
"max_anomaly_features": "5",
"breaker": {
"enabled": "true"
},
"request_timeout": "10s",
"checkpoint_read_queue_max_heap_percent": "0.001",
"max_batch_task_per_node": "10",
"checkpoint_read_queue_batch_size": "25",
"max_top_entities_for_historical_analysis": "1000",
"cooldown_minutes": "5m",
"expected_cold_entity_execution_time_in_millisecs": "3000",
"model_max_size_percent": "0.1",
"max_running_entities_per_detector_for_historical_analysis": "10",
"door_keeper_in_cache": {
"enabled": "false"
},
"page_size": "1000",
"checkpoint_read_queue_concurrency": "1",
"index_pressure_soft_limit": "0.6",
"max_multi_entity_anomaly_detectors": "10",
"max_entities_per_query": "1000000",
"checkpoint_saving_freq": "12h",
"delete_anomaly_result_when_delete_detector": "false",
"max_concurrent_preview": "2",
"max_cached_deleted_tasks": "1000",
"max_retry_for_unresponsive_node": "5",
"entity_cold_start_queue_concurrency": "1",
"ad_result_history_max_docs_per_shard": "1350000000",
"batch_task_piece_interval_seconds": "5",
"checkpoint_maintain_queue_max_heap_percent": "0.001",
"dedicated_cache_size": "10",
"filter_by_backend_roles": "false",
"ad_result_history_rollover_period": "12h",
"checkpoint_write_queue_concurrency": "2",
"hcad_cold_start_interpolation": {
"enabled": "false"
},
"backoff_initial_delay": "1000ms",
"batch_task_piece_size": "1000",
"checkpoint_ttl": "7d",
"enabled": "true",
"category_field_limit": "2",
"checkpoint_write_queue_batch_size": "25",
"result_write_queue_batch_size": "5000",
"expected_checkpoint_maintain_time_in_millisecs": "1000",
"max_primary_shards": "10",
"result_write_queue_concurrency": "2",
"result_write_queue_max_heap_percent": "0.01",
"cold_entity_queue_max_heap_percent": "0.001",
"ad_result_history_retention_period": "30d",
"backoff_minutes": "15m",
"detection_window_delay": "0m",
"checkpoint_write_queue_max_heap_percent": "0.01",
"max_entities_for_preview": "5",
"index_pressure_hard_limit": "0.9",
"max_model_size_per_node": "100",
"detection_interval": "10m",
"max_old_ad_task_docs_per_detector": "1",
"max_retry_for_backoff": "3",
"max_anomaly_detectors": "1000"
},
"jobscheduler": {
"jitter_limit": "0.6",
"request_timeout": "10s",
"sweeper": {
"backoff_millis": "50ms",
"period": "5m",
"page_size": "100"
},
"retry_count": "3"
}
},
"logger": {
"level": "INFO"
},
"processors": "8",
"ingest": {
"user_agent": {
"cache_size": "1000"
},
"geoip": {
"cache_size": "1000"
},
"grok": {
"watchdog": {
"max_execution_time": "1s",
"interval": "1s"
}
}
},
"pidfile": "",
"path": {
"data": [
"/var/lib/opensearch"
],
"logs": "/var/log/opensearch",
"shared_data": "",
"home": "/usr/share/opensearch",
"repo": []
},
"repositories": {
"fs": {
"compress": "false",
"chunk_size": "9223372036854775807b",
"location": ""
},
"url": {
"supported_protocols": [
"http",
"https",
"ftp",
"file",
"jar"
],
"allowed_urls": [],
"url": "http:"
}
},
"action": {
"auto_create_index": "true",
"search": {
"shard_count": {
"limit": "9223372036854775807"
}
},
"destructive_requires_name": "false"
},
"opensearch_dashboards": {
"system_indices": [
".opensearch_dashboards",
".opensearch_dashboards_*",
".reporting-*",
".apm-agent-configuration",
".apm-custom-link"
]
},
"cache": {
"recycler": {
"page": {
"limit": {
"heap": "10%"
},
"type": "CONCURRENT",
"weight": {
"longs": "1.0",
"ints": "1.0",
"bytes": "1.0",
"objects": "0.1"
}
}
}
},
"point_in_time": {
"init": {
"keep_alive": "30s"
},
"max_keep_alive": "24h"
},
"reindex": {
"remote": {
"allowlist": [],
"whitelist": []
}
},
"resource": {
"reload": {
"enabled": "true",
"interval": {
"low": "60s",
"high": "5s",
"medium": "30s"
}
}
},
"thread_pool": {
"force_merge": {
"queue_size": "-1",
"size": "1"
},
"fetch_shard_started": {
"core": "1",
"max": "16",
"keep_alive": "5m"
},
"listener": {
"queue_size": "-1",
"size": "4"
},
"refresh": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"remote_refresh": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"translog_sync": {
"queue_size": "10000",
"size": "32"
},
"system_write": {
"queue_size": "1000",
"size": "4"
},
"generic": {
"core": "4",
"max": "128",
"keep_alive": "30s"
},
"warmer": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"remote_purge": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"translog_transfer": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"ml_commons": {
"opensearch_ml_deploy": {
"queue_size": "10",
"size": "7"
},
"opensearch_ml_execute": {
"queue_size": "10",
"size": "7"
},
"opensearch_ml_register": {
"queue_size": "10",
"size": "7"
},
"opensearch_ml_train": {
"queue_size": "10",
"size": "7"
},
"opensearch_ml_predict": {
"queue_size": "10000",
"size": "16"
},
"opensearch_ml_general": {
"queue_size": "100",
"size": "7"
}
},
"search": {
"max_queue_size": "1000",
"queue_size": "1000",
"size": "13",
"auto_queue_frame_size": "2000",
"target_response_time": "1s",
"min_queue_size": "1000"
},
"opensearch_asynchronous_search_generic": {
"core": "1",
"max": "16",
"keep_alive": "30m"
},
"fetch_shard_store": {
"core": "1",
"max": "16",
"keep_alive": "5m"
},
"flush": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"management": {
"core": "1",
"max": "5",
"keep_alive": "5m"
},
"analyze": {
"queue_size": "16",
"size": "1"
},
"get": {
"queue_size": "1000",
"size": "8"
},
"system_read": {
"queue_size": "2000",
"size": "4"
},
"estimated_time_interval": "200ms",
"write": {
"queue_size": "10000",
"size": "8"
},
"snapshot": {
"core": "1",
"max": "4",
"keep_alive": "5m"
},
"search_throttled": {
"max_queue_size": "100",
"queue_size": "100",
"size": "1",
"auto_queue_frame_size": "200",
"target_response_time": "1s",
"min_queue_size": "100"
}
},
"index": {
"codec": "default",
"recovery": {
"type": ""
},
"store": {
"hybrid": {
"mmap": {
"extensions": [
"nvd",
"dvd",
"tim",
"tip",
"dim",
"kdd",
"kdi",
"cfs",
"doc",
"vec",
"vex"
]
}
},
"type": "",
"fs": {
"fs_lock": "native"
},
"preload": []
}
},
"replication_leader": {
"queue_size": "1000",
"size": "13"
},
"task_cancellation": {
"duration_millis": "10000",
"enabled": "true"
},
"script": {
"allowed_contexts": [],
"max_compilations_rate": "use-context",
"cache": {
"max_size": "100",
"expire": "0ms"
},
"painless": {
"regex": {
"enabled": "limited",
"limit-factor": "6"
}
},
"max_size_in_bytes": "65535",
"allowed_types": [],
"disable_max_compilations_rate": "false"
},
"indexing_pressure": {
"memory": {
"limit": "10%"
}
},
"node": {
"data": "true",
"roles": [
"data",
"cluster_manager",
"ml",
"ml_full_access"
],
"max_local_storage_nodes": "1",
"processors": "8",
"store": {
"allow_mmap": "true"
},
"ingest": "true",
"master": "true",
"pidfile": "/var/run/opensearch/opensearch.pid",
"search": {
"cache": {
"size": "0b"
}
},
"remote_cluster_client": "true",
"enable_lucene_segment_infos_trace": "false",
"local_storage": "true",
"name": "opensearch-1",
"id": {
"seed": "0"
},
"attr": {
"shard_indexing_pressure_enabled": "true"
},
"portsfile": "false"
},
"null": {
"queue_size": "1000",
"size": "8"
},
"http": {
"cors": {
"max-age": "1728000",
"allow-origin": "",
"allow-headers": "X-Requested-With,Content-Type,Content-Length",
"allow-credentials": "false",
"allow-methods": "OPTIONS,HEAD,GET,POST,PUT,DELETE",
"enabled": "false"
},
"max_chunk_size": "8kb",
"compression_level": "3",
"max_initial_line_length": "4kb",
"type": "org.opensearch.security.http.SecurityHttpServerTransport",
"pipelining": {
"max_events": "10000"
},
"type.default": "netty4",
"content_type": {
"required": "true"
},
"host": [],
"publish_port": "-1",
"read_timeout": "0ms",
"max_content_length": "100mb",
"netty": {
"receive_predictor_size": "64kb",
"max_composite_buffer_components": "69905",
"worker_count": "0"
},
"tcp": {
"reuse_address": "true",
"keep_count": "-1",
"keep_interval": "-1",
"no_delay": "true",
"keep_alive": "true",
"receive_buffer_size": "-1b",
"keep_idle": "-1",
"send_buffer_size": "-1b"
},
"bind_host": [],
"reset_cookies": "false",
"max_warning_header_count": "-1",
"tracer": {
"include": [],
"exclude": []
},
"max_warning_header_size": "-1b",
"detailed_errors": {
"enabled": "true"
},
"port": "9200-9300",
"max_header_size": "8kb",
"tcp_no_delay": "true",
"compression": "false",
"publish_host": []
},
"compatibility": {
"override_main_response_version": "false"
},
"snapshot": {
"max_concurrent_operations": "1000"
},
"bootstrap": {
"memory_lock": "false",
"system_call_filter": "true",
"ctrlhandler": "true"
},
"network": {
"host": [
"0.0.0.0"
],
"tcp": {
"reuse_address": "true",
"keep_count": "-1",
"connect_timeout": "30s",
"keep_interval": "-1",
"no_delay": "true",
"keep_alive": "true",
"receive_buffer_size": "-1b",
"keep_idle": "-1",
"send_buffer_size": "-1b"
},
"bind_host": [
"0.0.0.0"
],
"server": "true",
"breaker": {
"inflight_requests": {
"limit": "100%",
"overhead": "2.0"
}
},
"publish_host": [
"0.0.0.0"
]
},
"search": {
"default_search_timeout": "-1",
"highlight": {
"term_vector_multi_value": "true"
},
"max_open_pit_context": "300",
"cancel_after_time_interval": "-1",
"default_allow_partial_results": "true",
"max_open_scroll_context": "500",
"max_buckets": "65535",
"low_level_cancellation": "true",
"allow_expensive_queries": "true",
"keep_alive_interval": "1m",
"default_keep_alive": "5m",
"max_keep_alive": "24h"
},
"security": {
"manager": {
"filter_bad_defaults": "true"
}
},
"segrep": {
"pressure": {
"checkpoint": {
"limit": "4"
},
"time": {
"limit": "5m"
},
"replica": {
"stale": {
"limit": "0.5"
}
},
"enabled": "false"
}
},
"client": {
"type": "node"
},
"opendistro_security_config": {
"ssl_dual_mode_enabled": "false"
},
"rest": {
"action": {
"multi": {
"allow_explicit_index": "true"
}
}
},
"remote_store": {
"segment": {
"pressure": {
"bytes_lag": {
"variance_factor": "10.0"
},
"upload_bytes_moving_average_window_size": "20",
"upload_bytes_per_sec_moving_average_window_size": "20",
"time_lag": {
"variance_factor": "10.0"
},
"upload_time_moving_average_window_size": "20",
"consecutive_failures": {
"limit": "5"
},
"enabled": "false"
}
}
},
"replication_follower": {
"core": "1",
"max": "10",
"keep_alive": "1m"
},
"knn": {
"algo_param": {
"index_thread_qty": "1"
},
"cache": {
"item": {
"expiry": {
"enabled": "false",
"minutes": "3h"
}
}
},
"memory": {
"circuit_breaker": {
"limit": "50%",
"enabled": "true"
}
},
"plugin": {
"enabled": "true"
},
"queue_size": "1",
"size": "1",
"circuit_breaker": {
"unset": {
"percentage": "75.0"
},
"triggered": "false"
},
"model": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
},
"cache": {
"size": {
"limit": "10%"
}
}
}
},
"monitor": {
"jvm": {
"gc": {
"enabled": "true",
"overhead": {
"warn": "50",
"debug": "10",
"info": "25"
},
"refresh_interval": "1s"
},
"refresh_interval": "1s"
},
"process": {
"refresh_interval": "1s"
},
"os": {
"refresh_interval": "1s"
},
"fs": {
"health": {
"healthy_timeout_threshold": "60s",
"refresh_interval": "60s",
"enabled": "true",
"slow_path_logging_threshold": "5s"
},
"refresh_interval": "1s"
}
},
"transport": {
"tcp": {
"reuse_address": "true",
"keep_count": "-1",
"connect_timeout": "30s",
"keep_interval": "-1",
"compress": "false",
"port": "9300-9400",
"no_delay": "true",
"keep_alive": "true",
"receive_buffer_size": "-1b",
"keep_idle": "-1",
"send_buffer_size": "-1b"
},
"bind_host": [],
"connect_timeout": "30s",
"compress": "false",
"ping_schedule": "-1",
"connections_per_node": {
"recovery": "2",
"state": "1",
"bulk": "3",
"reg": "6",
"ping": "1"
},
"tracer": {
"include": [],
"exclude": [
"internal:coordination/fault_detection/*",
"cluster:monitor/nodes/liveness"
]
},
"type": "org.opensearch.security.ssl.http.netty.SecuritySSLNettyTransport",
"slow_operation_logging_threshold": "5s",
"type.default": "netty4",
"port": "9300-9400",
"host": [],
"publish_port": "-1",
"tcp_no_delay": "true",
"publish_host": [],
"netty": {
"receive_predictor_size": "64kb",
"receive_predictor_max": "64kb",
"worker_count": "8",
"receive_predictor_min": "64kb",
"boss_count": "1"
}
},
"task_resource_consumers": {
"enabled": "false"
},
"cluster_manager": {
"throttling": {
"retry": {
"max": {
"delay": "30s"
},
"base": {
"delay": "5s"
}
}
}
},
"indices": {
"replication": {
"retry_timeout": "60s",
"initial_retry_backoff_bound": "50ms"
},
"cache": {
"cleanup_interval": "1m"
},
"mapping": {
"dynamic_timeout": "30s",
"max_in_flight_updates": "10"
},
"memory": {
"interval": "5s",
"max_index_buffer_size": "-1",
"shard_inactive_time": "5m",
"index_buffer_size": "10%",
"min_index_buffer_size": "48mb"
},
"breaker": {
"request": {
"limit": "60%",
"type": "memory",
"overhead": "1.0"
},
"total": {
"limit": "95%",
"use_real_memory": "true"
},
"fielddata": {
"limit": "40%",
"type": "memory",
"overhead": "1.03"
},
"type": "hierarchy"
},
"query": {
"bool": {
"max_clause_count": "1024"
},
"query_string": {
"analyze_wildcard": "false",
"allowLeadingWildcard": "true"
}
},
"id_field_data": {
"enabled": "true"
},
"recovery": {
"recovery_activity_timeout": "1800000ms",
"retry_delay_network": "5s",
"internal_action_timeout": "15m",
"retry_delay_state_sync": "500ms",
"internal_action_long_timeout": "1800000ms",
"max_concurrent_operations": "1",
"max_bytes_per_sec": "40mb",
"max_concurrent_file_chunks": "2"
},
"requests": {
"cache": {
"size": "1%",
"expire": "0ms"
}
},
"store": {
"delete": {
"shard": {
"timeout": "30s"
}
}
},
"analysis": {
"hunspell": {
"dictionary": {
"ignore_case": "false",
"lazy": "false"
}
}
},
"queries": {
"cache": {
"count": "10000",
"size": "10%",
"all_segments": "false"
}
},
"fielddata": {
"cache": {
"size": "-1b"
}
}
},
"plugin": {
"mandatory": []
},
"opensearch": {
"reports": {
"general": {
"operationTimeoutMs": "60000",
"defaultItemsQueryCount": "100"
}
},
"experimental": {
"feature": {
"concurrent_segment_search": {
"enabled": "false"
},
"extensions": {
"enabled": "false"
},
"telemetry": {
"enabled": "false"
},
"remote_store": {
"enabled": "false"
},
"segment_replication_experimental": {
"enabled": "false"
},
"identity": {
"enabled": "false"
}
}
},
"ad": {
"ad-threadpool": {
"core": "1",
"max": "4",
"keep_alive": "10m"
},
"ad-batch-task-threadpool": {
"core": "1",
"max": "1",
"keep_alive": "10m"
}
},
"observability": {
"general": {
"operationTimeoutMs": "60000",
"defaultItemsQueryCount": "1000"
},
"access": {
"filterBy": "NoFilter",
"ignoreRoles": [
"own_index",
"opensearch_dashboards_user",
"notebooks_full_access",
"notebooks_read_access"
],
"adminAccess": "AllObservabilityObjects"
},
"polling": {
"maxLockRetries": "4",
"jobLockDurationSeconds": "300",
"maxPollingDurationSeconds": "900",
"minPollingDurationSeconds": "300"
}
},
"notifications": {
"core": {
"allowed_config_types": [
"slack",
"chime",
"webhook",
"email",
"sns",
"ses_account",
"smtp_account",
"email_group"
],
"tooltip_support": "true",
"http": {
"socket_timeout": "50000",
"host_deny_list": [],
"max_connections": "60",
"connection_timeout": "5000",
"max_connection_per_route": "20"
},
"email": {
"minimum_header_length": "160",
"size_limit": "10000000"
}
},
"general": {
"default_items_query_count": "100",
"operation_timeout_ms": "60000",
"filter_by_backend_roles": "false"
}
}
},
"discovery": {
"seed_hosts": [],
"unconfigured_bootstrap_timeout": "3s",
"request_peers_timeout": "3000ms",
"zen": {
"hosts_provider": [],
"ping": {
"unicast": {
"concurrent_connects": "10",
"hosts": [],
"hosts.resolve_timeout": "5s"
}
}
},
"initial_state_timeout": "30s",
"cluster_formation_warning_timeout": "10000ms",
"seed_providers": [],
"find_peers_interval_during_decommission": "120s",
"type": "single-node",
"seed_resolver": {
"max_concurrent_resolvers": "10",
"timeout": "5s"
},
"find_peers_interval": "1000ms",
"probe": {
"connect_timeout": "3000ms",
"handshake_timeout": "1000ms"
}
},
"search_backpressure": {
"mode": "monitor_only",
"cancellation_burst": "10.0",
"cancellation_ratio": "0.1",
"cancellation_rate": "0.003",
"search_task": {
"elapsed_time_millis_threshold": "45000",
"heap_variance": "2.0",
"heap_percent_threshold": "0.02",
"cancellation_burst": "5.0",
"cpu_time_millis_threshold": "30000",
"cancellation_ratio": "0.1",
"cancellation_rate": "0.003",
"total_heap_percent_threshold": "0.05",
"heap_moving_average_window_size": "100"
},
"node_duress": {
"cpu_threshold": "0.9",
"heap_threshold": "0.7",
"num_successive_breaches": "3"
},
"search_shard_task": {
"elapsed_time_millis_threshold": "30000",
"heap_variance": "2.0",
"heap_percent_threshold": "0.005",
"cancellation_burst": "10.0",
"cpu_time_millis_threshold": "15000",
"cancellation_ratio": "0.1",
"cancellation_rate": "0.003",
"total_heap_percent_threshold": "0.05",
"heap_moving_average_window_size": "100"
}
},
"shard_indexing_pressure": {
"primary_parameter": {
"node": {
"soft_limit": "0.7"
},
"shard": {
"min_limit": "0.001"
}
},
"enforced": "false",
"secondary_parameter": {
"successful_request": {
"max_outstanding_requests": "100",
"elapsed_timeout": "300000ms"
},
"throughput": {
"request_size_window": "2000",
"degradation_factor": "5.0"
}
},
"cache_store": {
"max_size": "200"
},
"enabled": "false",
"operating_factor": {
"optimal": "0.85",
"lower": "0.75",
"upper": "0.95"
}
},
"gateway": {
"recover_after_master_nodes": "0",
"expected_nodes": "-1",
"recover_after_data_nodes": "-1",
"expected_data_nodes": "-1",
"write_dangling_indices_info": "true",
"slow_write_logging_threshold": "10s",
"recover_after_time": "0ms",
"expected_master_nodes": "-1",
"recover_after_nodes": "-1",
"auto_import_dangling_indices": "false"
}
}
}
@pdolinic , I see some problems
node.roles
is not correct.
node.roles: [ data, cluster_manager, ml_full_access ]
If you want to add ML role to node, you should not use ml_full_access
, just change to ml
, refer to https://opensearch.org/docs/latest/ml-commons-plugin/index/#ml-node, so you should use node.roles: [ data, cluster_manager, ml ]
ml_full_access
is a "permission" role for security, not "node" role
POST /_plugins/_ml/model_groups/_register
{
"name": "test_model_group_public-b",
"description": "This is a public model group"
}
And the model group created, model group id is Fd9aHYoBh74vBCV4b8BC
Later you upload model with a different model group id 8IjOsYgBFp6IJxCceZ2-
, I guess that model group doesn't exist, so you see "error": "Model group not found",
POST /_plugins/_ml/models/_register
{
"name": "huggingface/sentence-transformers/all-MiniLM-L12-v2",
"version": "1.0.1",
"model_format": "TORCH_SCRIPT",
"model_group_id": "8IjOsYgBFp6IJxCceZ2-"
}
I see you are trying to get task status with GET /_plugins/_ml/tasks/8IjOsYgBFp6IJxCceZ2-
, so I'm confused is 8IjOsYgBFp6IJxCceZ2-
task id or model group id?
Thanks I got confused by the Model Group, and sorted that out now, assiging to the correct group and managing a correct registeration I could progress:
POST /_plugins/_ml/models/_register { "name": "huggingface/sentence-transformers/all-MiniLM-L12-v2", "version": "1.0.1", "model_format": "TORCH_SCRIPT", "model_group_id": "G8QV5okBRMEPLlaZOe87" }
returns
{ "task_id": "APhKHooBNrcgC_kYjn7q", "status": "CREATED" }
I also added
PUT /_cluster/settings { "persistent": { "plugins.ml_commons.sync_up_job_interval_in_seconds": 600 } }
Problem: When I want to check the task via:
GET /_plugins/_ml/tasks/APhKHooBNrcgC_kYjn7q
{
"task_type": "REGISTER_MODEL",
"function_name": "TEXT_EMBEDDING",
"state": "FAILED",
"worker_node": [
"KmfjhwWjS7eepv4PnUMKWw"
],
"create_time": 1692725317352,
"last_update_time": 1692725318453,
"error": """Cannot invoke "org.opensearch.cluster.metadata.MappingMetadata.getSourceAsMap()" because the return value of "org.opensearch.cluster.metadata.IndexMetadata.mapping()" is null""",
"is_async": true
}
Update: From here on at Intervals I can see Opensearch-Dashboards for a milisecond refresh with something that could be this model, but then then entirely disappear again entirely.
I found a similar issue here with that mapping-Error I am running into : https://github.com/opensearch-project/security-analytics/issues/305
This seems index related, I removed some ones from Graylog / Icinga that are certainly unreleated, here are the others, maybe one of them is causing this:
GET /_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open .plugins-ml-model-group jGAUj3TMSr6eol5PKivEjQ 1 1 16 3 50.9kb 50.9kb
green open .ql-datasources xjXYSmGmS5equteZ6yO97g 1 0 0 0 208b 208b
yellow open .plugins-ml-task iHmP8ebKQVerz5a7FVDq8A 1 1 47 1 39kb 39kb
green open .opendistro-reports-definitions wvK55M5-TIyL7EuUYGoduw 1 0 2 0 9.7kb 9.7kb
green open .opendistro_security bwOxmf4FQpyOxQ8FGj-dQw 1 0 10 0 48kb 48kb
green open .opendistro-reports-instances _oHIjzIVTjaxdaKQhlXORw 1 0 2 0 12kb 12kb
yellow open sample-host-health YF9jbVsHREa3DfpIiv8Mww 1 1 40320 0 1.2mb 1.2mb
green open .opensearch-observability c21YwgVPTPu7Hm5M9NVQuQ 1 0 0 0 208b 208b
yellow open .plugins-ml-model L3XnckAoTkeqLXDDDd0Wzw 1 1 0 0 208b 208b
yellow open icingabeat-7.17.4 yIW7P6bSRuyAWkeewBbmQg 3 2 71884 0 25.9mb 25.9mb
green open graylog_1 Y37EYeiDSLSer_H9gtYHng 4 0 2343293 0 1gb 1gb
green open graylog_0 awS-XplgSDmR_55iJco5iw 4 0 20543833 0 10.6gb 10.6gb
green open gl-system-events_0 rXiu44P5R8CQY_1uht--lQ 4 0 0 0 832b 832b
green open opensearch_dashboards_sample_data_ecommerce 7RBFKsh_QCu-5KRF4KaGIQ 1 0 4675 0 4mb 4mb
green open gl-events_0 me8p9BUTQjuNoRH_5nta0w 4 0 0 0 832b 832b
green open .kibana_2 YO-Csy4RRhK0PmNpzCneAQ 1 0 35 3 102.7kb 102.7kb
green open .kibana_1 ab9zq7flRDKf_L3yMXmbtg 1 0 23 4 57.1kb 57.1kb
yellow open .plugins-ml-config IAp6XfvpQzG-UyJk6ewVRg 1 1 1 0 3.9kb 3.9kb
yellow open security-auditlog-2023.07.14 V0rV_bNySv2RPzKbHA3uuA 1 1 121185 0 73.2mb 73.2mb
yellow open security-auditlog-2023.07.13 U3s4BVL2R72QQ4SW7PpMgQ 1 1 6135 0 9.6mb 9.6mb
yellow open security-auditlog-2023.07.12 woBSKAcBSgOLKlQwZGIkwA 1 1 72 0 1.1mb 1.1mb
yellow open .opendistro-job-scheduler-lock kuSgOfMHSRW8ZxXFDjfNLg 1 1 3 0 5.8kb 5.8kb
green open .opensearch-notifications-config sTzC4WdzSHuHyFIfSI5ONA 1 0 0 0 208b 208b
Okay looks like it seems to work, still not seeing it in dashboards but:
1) deleted deleted the opensearch-ml plugins, 2) deleted all opensearch-dashboards plugins entirely 3) deleted all those indices
DELETE /.plugins-ml-model-group
DELETE /.plugins-ml-task
DELETE /.plugins-ml-model
DELETE /.plugins-ml-config
DELETE /.opensearch-sap-correlation-rules-config
DELETE /sample-host-health
4) reinstalled opensearch 5) reinstalled all opensearch-dashboards-plugins 6) redid model group creation, upload, registration into group, and now and now I am getting a complete back!
GET /_plugins/_ml/tasks/MprLIYoBmWoi9V5-o5Ix
---
# SUCCESS
---
{
"model_id": "RJrLIYoBmWoi9V5-o5LW",
"task_type": "REGISTER_MODEL",
"function_name": "TEXT_EMBEDDING",
"state": "COMPLETED",
"worker_node": [
"KmfjhwWjS7eepv4PnUMKWw"
],
"create_time": 1692784108336,
"last_update_time": 1692784119285,
"is_async": true
}
Got it working, thanks a lot for the pointers! I'll do an OpenSearch Blog series and this will be one part of it.
Cool, glad to see it works!
@pdolinic , please share the blog link here to help more people!
What is the bug?
This ss4o metric bug triggers on trying to activate registration, I tried a lot of settings:
As soon as I disable this parameter again, it works
/etc/opensearch/opensearch.yml
that normally works perfectly, just trying out ML and seeing problems:A clear and concise description of what you expected to happen.
What is your host/environment?