opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.77k stars 1.82k forks source link

[META] Streaming Indexing API #9065

Open reta opened 1 year ago

reta commented 1 year ago

Is your feature request related to a problem? Please describe. The meta issue to track the Streaming Indexing API progress

Describe the solution you'd like As outcome of the https://github.com/opensearch-project/OpenSearch/issues/5001 & https://github.com/opensearch-project/OpenSearch/pull/7273, we have outlined the way such steaming support could be integrated into OpenSearch.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

dblock commented 1 year ago

@reta pretty great stuff - there's other work on improving client/server performance, how do you see us fit work on Protobuf (@VachaShah) / gRPC?

reta commented 1 year ago

@reta pretty great stuff - there's other work on improving client/server performance, how do you see us fit work on Protobuf (@VachaShah) / gRPC?

Thanks @dblock, the gRPC would definitely benefit from the reactive streaming part. From other side, I assume the gRPC would be used as node transport layer (at least, initially), so the HTTP reactive layer (as suggested alternative transport for HTTP clients) would benefit enormously from that - we will be having end-to-end reactive processing pipeline.

shwetathareja commented 1 year ago

@reta Looking to collaborate on the Streaming API changes and see by when it can make it to OpenSearch release.

reta commented 1 year ago

@reta Looking to collaborate on the Streaming API changes and see by when it can make it to OpenSearch release.

@shwetathareja that would be great, the first thing is to get this one in https://github.com/opensearch-project/OpenSearch/pull/9672 - the pull request adds new HTTP transport based on Reactor Netty 4 with streaming support, it is well on schedule for 2.12. What is left there is testing part, since this transport is not default (and experimental), needs some ad-hoc testing. I should be able to wrap it up this week (the two back to back releases derailed the plans a bit).

Once the transport is there, we could split the work, there are quite a few opportunities for doing that in parallel, thank you.

shwetathareja commented 1 year ago

@reta sounds good. I will also go through https://github.com/opensearch-project/OpenSearch/pull/9672 to get better understanding. Lets connect next week around how we can split the remaining work. Looking forward to working together. Thank you!

T-J-L commented 1 month ago

Hi, I'm really interested in testing this feature.

Does the current implementation support bi-directional streaming, I.e. returning responses for each chunk/document?

Currently I'm streaming the request, but OpenSearch appears to wait until the request is complete before sending the response. Not sure whether my setup is wrong, or if this is expected.

Thanks

Edit: this works as expected. My code was the issue!

reta commented 1 month ago

Does the current implementation support bi-directional streaming, I.e. returning responses for each chunk/document?

Just for visibility, yes the implementation support bi-directional streaming, thanks @T-J-L !

thomas-long-f3 commented 1 month ago

I didn't want to clutter this issue so created a separate one here, but any help would be appreciated 😄