opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.45k stars 1.74k forks source link

Support gRPC/protobuf for RESTful APIs and node-to-node communication #1287

Open anasalkouz opened 2 years ago

anasalkouz commented 2 years ago

Replace REST/JSON-based API with gRPC/protobuf to improve serialization/deserialization performance. Ref: https://grpc.io/docs/what-is-grpc/introduction/

AmiStrn commented 2 years ago

Would this also help with streaming rather than doc by doc or bulk indexing?

penghuo commented 2 years ago

Would gRPC also benefit node-to-node communication?

anasalkouz commented 2 years ago

Would gRPC also benefit node-to-node communication?

Sure, this should help in both node to node communication (Internal) and client to server communication (External).

dblock commented 2 years ago

@saratvemulapalli This becomes theoretically swappable via the extensions work, doesn't it?

anasalkouz commented 2 years ago

Based on my understanding of how the internal communication between the node works now, it's using a single transport layer and it should be easy to change how to works. @mch2 please keep me honest.

saratvemulapalli commented 2 years ago

@dblock Extensions are being built over transport (layer-4), GPRC/Protobuf operate atleast at layer 5. Extensions are adaptable to new communication mechanisms.

@anasalkouz that is correct. Netty4 transport is the module which implements transport in OpenSearch via NIO. This is swappable with new communication mechanisms. Netty4 also includes Layer 7 (HTTP) support.

anasalkouz commented 2 years ago

Shall we swap Netty4 with gPRC vs upgrading to Netty 5 ?

dblock commented 2 years ago

Shall we swap Netty4 with gPRC vs upgrading to Netty 5 ?

Yes! I would enable that as an option/experiment and benchmark it.

saratvemulapalli commented 1 year ago

Opened up a RFC for Protobuf in OpenSearch: https://github.com/opensearch-project/OpenSearch/issues/6844

navneet1v commented 1 year ago

@dblock is there a separate issue to track the gRPC protocol integration in OpenSearch? or it is already done?

saratvemulapalli commented 1 year ago

@navneet1v there isn't one. We could use this issue to track. cc: @VachaShah

navneet1v commented 1 year ago

@saratvemulapalli if we are using this issue to track can we add more details in the description around what we are doing and all the tasks which are tracking for this.

Bukhtawar commented 1 year ago

Just so as to clarify we are only evaluating the data serialisation protocol(protobuf/avro/ion) etc and not really the transport layers like Netty/gRPC etc

saratvemulapalli commented 1 year ago

@navneet1v as @Bukhtawar said there is no body working on transport protocol yet (like GPRC). @VachaShah is exploring different serialization/de-serialization mechanisms we can adopt for OpenSearch.

Do you see a use case for gRPC? What are you looking for ?

navneet1v commented 1 year ago

What I am looking for is a light weight protocol than HTTP. While doing a search we spend a significant amount to time at this HTTP layer so, my thinking was how we can replace this so that we can get better query latencies.

Bukhtawar commented 1 year ago

@navneet1v to clarify, we use TCP for node to node communication, did you mean HTTP at the REST layer? Might be good to see some profiling results nonetheless based on your observation

anirudha commented 1 year ago

Ideally i would like to see support for Apache Arrow 0-copy via any http/2 transport

anirudha commented 1 year ago

@anasalkouz this way you remove the need for serDeSer but; this might need more extensive changes. https://arrow.apache.org/docs/format/Flight.html

Bukhtawar commented 1 year ago

@anirudha Netty(default HTTP/transport) already supports zero-copy. Lets not complicate this issue further

dblock commented 1 year ago

I've renamed this issue to "Support gRPC/protobuf for RESTful APIs and node-to-node communication". I think we want it all, including _bulk support.