opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.88k stars 1.84k forks source link

[Feature Request] Storage Reduction for id fields #14831

Open mgodwan opened 4 months ago

mgodwan commented 4 months ago

Is your feature request related to a problem? Please describe

Today, _id field is:

  1. Indexed using the FST data structure
  2. Stored using stored field mapper

For time-series data, the generated _id is not used very often as a query term. The _id field generated is optimized for query but by generating it in a way that we have longer common prefix, we may be able to reduce the storage size it takes.

https://github.com/opensearch-project/OpenSearch/blob/71aefa51b84750042b1698ed2b549d4f92209e1b/libs/common/src/main/java/org/opensearch/common/TimeBasedUUIDGenerator.java#L40-L42

Describe the solution you'd like

A new implementation of UUID generator which can reduce the storage size for the field

Related component

Indexing:Performance

Describe alternatives you've considered

No response

Additional context

No response

mgodwan commented 4 months ago

[Indexing Triage Meeting 07/22]

Next step by @mgodwan to share a sample implementation with JMH micro-benchmarks.