opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.68k stars 1.79k forks source link

[FEATURE] CAT json responses of type string with format #14818

Closed Jakob3xD closed 2 months ago

Jakob3xD commented 3 months ago

Is your feature request related to a problem?

cat requests return every field as string even though the actual value is a different one. It would be nice to have the actual type of value and not only the string value. Uses would not need to convert each field by them self. Example:

GET _cat/indices?format=json&h=*
[
  {
    "health": "green",
    "status": "open",
    "index": "some-index",
    "uuid": "some-uuid",
    "pri": "2",
    "rep": "2",
    "docs.count": "275689557",
    "docs.deleted": "0",
    "creation.date": "1720798637918",
    "creation.date.string": "2024-07-12T15:37:17.918Z",
    "store.size": "128.9gb",
    "pri.store.size": "43gb",
    "completion.size": "0b",
    "pri.completion.size": "0b",
    "fielddata.memory_size": "0b",
    "pri.fielddata.memory_size": "0b",
    "fielddata.evictions": "0",
    "pri.fielddata.evictions": "0",
    "query_cache.memory_size": "0b",
    "pri.query_cache.memory_size": "0b",
    "query_cache.evictions": "0",
    "pri.query_cache.evictions": "0",
    "request_cache.memory_size": "0b",
    "pri.request_cache.memory_size": "0b",
    "request_cache.evictions": "0",
    "pri.request_cache.evictions": "0",
    "request_cache.hit_count": "0",
    "pri.request_cache.hit_count": "0",
    "request_cache.miss_count": "0",
    "pri.request_cache.miss_count": "0",
    "flush.total": "644",
    "pri.flush.total": "315",
    "flush.total_time": "7.1m",
    "pri.flush.total_time": "3.5m",
    "get.current": "0",
    "pri.get.current": "0",
    "get.time": "0s",
    "pri.get.time": "0s",
    "get.total": "0",
    "pri.get.total": "0",
    "get.exists_time": "0s",
    "pri.get.exists_time": "0s",
    "get.exists_total": "0",
    "pri.get.exists_total": "0",
    "get.missing_time": "0s",
    "pri.get.missing_time": "0s",
    "get.missing_total": "0",
    "pri.get.missing_total": "0",
    "indexing.delete_current": "0",
    "pri.indexing.delete_current": "0",
    "indexing.delete_time": "0s",
    "pri.indexing.delete_time": "0s",
    "indexing.delete_total": "0",
    "pri.indexing.delete_total": "0",
    "indexing.index_current": "0",
    "pri.indexing.index_current": "0",
    "indexing.index_time": "1.1d",
    "pri.indexing.index_time": "13.4h",
    "indexing.index_total": "511691610",
    "pri.indexing.index_total": "252881020",
    "indexing.index_failed": "0",
    "pri.indexing.index_failed": "0",
    "merges.current": "0",
    "pri.merges.current": "0",
    "merges.current_docs": "0",
    "pri.merges.current_docs": "0",
    "merges.current_size": "0b",
    "pri.merges.current_size": "0b",
    "merges.total": "16004",
    "pri.merges.total": "7965",
    "merges.total_docs": "2022058732",
    "pri.merges.total_docs": "996890907",
    "merges.total_size": "338.4gb",
    "pri.merges.total_size": "164.8gb",
    "merges.total_time": "16.3h",
    "pri.merges.total_time": "8h",
    "refresh.total": "60334",
    "pri.refresh.total": "30119",
    "refresh.time": "6.2h",
    "pri.refresh.time": "3h",
    "refresh.external_total": "59658",
    "pri.refresh.external_total": "29778",
    "refresh.external_time": "6.3h",
    "pri.refresh.external_time": "3.1h",
    "refresh.listeners": "0",
    "pri.refresh.listeners": "0",
    "search.fetch_current": "0",
    "pri.search.fetch_current": "0",
    "search.fetch_time": "0s",
    "pri.search.fetch_time": "0s",
    "search.fetch_total": "0",
    "pri.search.fetch_total": "0",
    "search.open_contexts": "0",
    "pri.search.open_contexts": "0",
    "search.query_current": "0",
    "pri.search.query_current": "0",
    "search.query_time": "0s",
    "pri.search.query_time": "0s",
    "search.query_total": "0",
    "pri.search.query_total": "0",
    "search.concurrent_query_current": "0",
    "pri.search.concurrent_query_current": "0",
    "search.concurrent_query_time": "0s",
    "pri.search.concurrent_query_time": "0s",
    "search.concurrent_query_total": "0",
    "pri.search.concurrent_query_total": "0",
    "search.concurrent_avg_slice_count": "0.0",
    "pri.search.concurrent_avg_slice_count": "0.0",
    "search.scroll_current": "0",
    "pri.search.scroll_current": "0",
    "search.scroll_time": "0s",
    "pri.search.scroll_time": "0s",
    "search.scroll_total": "0",
    "pri.search.scroll_total": "0",
    "search.point_in_time_current": "0",
    "pri.search.point_in_time_current": "0",
    "search.point_in_time_time": "0s",
    "pri.search.point_in_time_time": "0s",
    "search.point_in_time_total": "0",
    "pri.search.point_in_time_total": "0",
    "segments.count": "183",
    "pri.segments.count": "63",
    "segments.memory": "0b",
    "pri.segments.memory": "0b",
    "segments.index_writer_memory": "14.4mb",
    "pri.segments.index_writer_memory": "4.2mb",
    "segments.version_map_memory": "0b",
    "pri.segments.version_map_memory": "0b",
    "segments.fixed_bitset_memory": "0b",
    "pri.segments.fixed_bitset_memory": "0b",
    "warmer.current": "0",
    "pri.warmer.current": "0",
    "warmer.total": "59654",
    "pri.warmer.total": "29777",
    "warmer.total_time": "1.5s",
    "pri.warmer.total_time": "778ms",
    "suggest.current": "0",
    "pri.suggest.current": "0",
    "suggest.time": "0s",
    "pri.suggest.time": "0s",
    "suggest.total": "0",
    "pri.suggest.total": "0",
    "memory.total": "14.4mb",
    "pri.memory.total": "4.2mb",
    "search.throttled": "false"
  }
]

What solution would you like?

OpenAPI supports the format field. https://swagger.io/docs/specification/data-models/data-types/#string I suggest using the format field to define the actual type of the field. So for search.throttled the format would be boolean, suggest.total would be int64 and warmer.total_time would be duration.

What alternatives have you considered?

Another way would be to define pattern but this would make it more complicated and does not really serve the job.

Do you have any additional context?

Another question would be if fields that are not returned by default should be of type ["string", "null"] as they are not present in every response?

dblock commented 3 months ago

Moving this to OpenSearch repo.

CAT stands for "Compact and aligned text (CAT) APIs" (note the text part) and Elasticsearch designed it with "cat APIs are only intended for human consumption" in mind. I think this is why these APIs return text.

Screenshot 2024-07-18 at 8 17 13 AM

Of course, in 2024, this doesn't make sense. Strongly typed JSON responses are equally humanly readable when they aren't strings.

peternied commented 2 months ago

[Triage - attendees 1 2] @Jakob3xD Thanks for creating this issue, it looks like this feature request is a good one and it could break existing applications that expect only string fields. Maybe there should be more discussion about what is done here.

If there is a specific piece of information that you'd like a structured way to access maybe there is another API that does expose that information, maybe create another issue if that use case isn't covered with our existing APIs.

Jakob3xD commented 2 months ago

My initial intention was related to the golang opensearch lib, where I currently parse the fields that are numbers inside string as actual numbers. Knowing what cat actually stands for made my request probably obsolete. As the lib uses should not use the cat API, to for example get index information, but rather use the /_stats endpoint. IMO the strings can be kept as string as they also transform with the time and bytes parameter.

Therefore I am good with closing the issue.