Open abhaypersistent opened 2 years ago
Hi @anirudha,
As with any other application, we have started analyzing the OpenSearch server logs to find all the metrices. The problem with logs is, they are very random and do not follow some specific pattern, so it becomes difficult to apply some regex to find any values. In addition, the data available from the logs are not much helpful to gather many type of metric data, like the number of indices, data volume in each indices, etc.
Here is a sample from OpenSearch server logs
[2022-06-15T14:30:35,568][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded module [rank-eval]
[2022-06-15T14:30:35,568][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded module [reindex]
[2022-06-15T14:30:35,569][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded module [repository-url]
[2022-06-15T14:30:35,570][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded module [test-delayed-aggs]
[2022-06-15T14:30:35,571][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded module [transport-netty4]
[2022-06-15T14:30:35,574][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded plugin [opensearch-observability]
[2022-06-15T14:30:35,575][INFO ][o.o.p.PluginsService ] [PSL-5CD1520ZT5] loaded plugin [opensearch-sql]
[2022-06-15T14:30:35,672][INFO ][o.o.e.NodeEnvironment ] [PSL-5CD1520ZT5] using [1] data paths, mounts [[/mnt/d (drvfs)]], net usable_space [260.4gb], net total_space [276.3gb], types [9p]
[2022-06-15T14:30:35,673][INFO ][o.o.e.NodeEnvironment ] [PSL-5CD1520ZT5] heap size [1gb], compressed ordinary object pointers [true]
[2022-06-15T14:30:36,654][INFO ][o.o.n.Node ] [PSL-5CD1520ZT5] node name [PSL-5CD1520ZT5], node ID [O9ylm_SKQ0yXQbW205wnLA], cluster name [opensearch], roles [cluster_manager, remote_cluster_client, data, ingest]
[2022-06-15T14:30:36,993][WARN ][o.o.o.s.PluginSettings ] [PSL-5CD1520ZT5] observability:Failed to load /mnt/d/opensearch-2.0.0-SNAPSHOT/config/opensearch-observability/observability.yml
[2022-06-15T14:30:42,569][INFO ][o.o.t.NettyAllocator ] [PSL-5CD1520ZT5] creating NettyAllocator with the following configs: [name=unpooled, suggested_max_allocation_size=256kb, factors={opensearch.unsafe.use_unpooled_allocator=null, g1gc_enabled=true, g1gc_region_size=1mb, heap_size=1gb}]"
So we started looking for alternative solutions. Here are few alternatives that we found which can help us deriving matrices:
When running opensearch as an AWS service, CloudWatch can be configured to monitor OpenSearch resources in real time. This can help us collect all metrices data. We can collect and track metrics.
Advantage :
Disadvantage :
We are stopping our further analysis on CloudWatch for now, as this requires an active AWS account.
Similar to above, when OpenSearch is running as AWS service, Amazon CloudTrail can capture API calls to OpenSearch Service as Events. It can capture those events and write to an Amazon S3 buckets that we can specify in the configuration. Using this information, you can identify which users and accounts made requests, the source IP address from which the requests were made, and when the requests occurred.
Advantage :
Disadvantage:
We are stopping our further analysis on CloudTrail for now, as this requires an active AWS account.
The Performance Analyzer plugin provides many RESTful APIs to fetch different metrics from OpenSearch. We can triger those API in specific interval of time. After reciving the data we can write it into the file or on an HTTP channel, from where fluentD can pick the data and forward to OpenSearch for creating Observability.
Advantages:
Disadvantage:
The PerfTop CLI available in the OpenSearch project already uses the Performance Analyzer utility to fetch the pre-configured dashboards for analyzing OpenSearch clusters. Currently there is no way to forward this data to OpenSearch or any other service to create observability.
We can modify the perftop utility to provide an option to fetch all the metrices and write the output to a file instead of showing dashboards visually. The file can be used by fluentD to forward the data to OpenSearch to create observability.
Advantage:
We do not see much disadvantages here, as the existing application is going to operate as is, and we are planning to provide additional options to write to a file. This should not ideally interfere with existing applications that are currently in use.
Let us know if we are missing something that can be considered along with the above. Or any advantages or disadnatages we are overlooking in the description above.
@spattnaik @abasatwar
1.Install and configure OpenSearch as a service.
2.Analyze different metrices for OpenSearch that can be monitored.