In-memory columnar representation of any intermediate results from search
Data adjacency for sequential access (scans)
O(1) (constant-time) random access
SIMD and vectorization-friendly
Relocatable without “pointer swizzling”, allowing for true zero-copy access in shared memory.
Interoperable representation of columnar data to be used across different engines like sharing between opensearch and datafusion, which is a rust based engine.
RPC using bidirectional streams: making use of GRPC bidirectional streams handling backpressure from the client in realtime and producing batches of records on demand. Used both for internode communication (between data nodes and cordinator) as well as communication with end client.
Use cases
Optimize memory overhead, cpu utilization and performance for -
Apache Arrow will serve as a library for in-memory columnar representation on any transient results used for retrieval in these use cases. Arrow Flight to be used for stream RPC.
Please describe the end goal of this project
Use cases
Apache Arrow will serve as a library for in-memory columnar representation on any transient results used for retrieval in these use cases. Arrow Flight to be used for stream RPC.
Supporting References
JOINs RFC making use of this integration - https://github.com/opensearch-project/OpenSearch/issues/15185
Issues
getStream()
andgetFlightInfo()
APIs.ProxyStreamProducer
acting as a proxy stream connecting the right data node holding the stream for a given ticket to the client.Related component
Search