opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.77k stars 1.82k forks source link

[Feature Request] Use `Roaring64Bitmap` to optimize `termsQury` query in Long type #15638

Closed kkewwei closed 2 months ago

kkewwei commented 2 months ago

Is your feature request related to a problem? Please describe

In #14774, we introduce RoaringBitmap to optimize termsQury query in Integer type, Roaring64Bitmap and Roaring64NavigableMap are also provided to encode Long, which can be used in termsQury query in Long type.

Describe the solution you'd like

As Roaring64Bitmap seems to be more space-efficient, If we could use it to optimize termsQury query in Long type? I'm please to implement it.

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

kkewwei commented 2 months ago

@msfroh, @bowenlan-amzn, please help confirm it.

msfroh commented 2 months ago

Hmm... we'll need to think about backward compatibility and serialization. In particular, we currently rely on RoaringBitmap having a well-specified (language-independent) binary representation that we can send as a base64-encoded payload.

Checking the JavaDoc for Roaring64Bitmap#serialize, it says:

Serialize this bitmap. Unlike RoaringBitmap, there is no specification for now: it may change from one java version to another, and from one RoaringBitmap version to another.

If we do want to add support for Roaring64Bitmap, I think we would want to pass some additional info to the query (like instead of the type being bitmap, it would be bitmap64). We would need to tag the stored values with some kind of version information -- which is also hard to do with the current implementation, since we're just storing a binary stored field. I think the client/server communication is the hardest part -- how do we, on the server, know what version of Roaring64Bitmap is being used by the client to send the base64-encoded bytes?

It does look like there's a format extension designed for 64-bits: https://github.com/RoaringBitmap/RoaringFormatSpec?tab=readme-ov-file#extension-for-64-bit-implementations, but that not also says:

Java Roaring bitmaps implementation offers an ART-based 64-bit implementation. It may reach better performances (compression and/or computation). But as of 2022-11, it is not compatible with this Serialization format.

I'm not totally opposed to 64-bit support, but I worry that the format may not be fully settled yet, which might cause maintenance problems down the road.

kkewwei commented 2 months ago

@msfroh Very thank you for your detailed reply.

It seems no suitable to use Roaring64Bitmap for now, I will close the issue, If there is any progress, I will open it again.