Support for stream in detectText

ebdrup commented 7 years ago

I would love to be able to stream a fileupload from a user straight to the google vision API for textAnnotations, to make the OCR run faster.

Unfortunately the API currently only supports images as buffers, filenames or URLs, not streams.

Us node.js entusiasts love streams, this would make us able to stream a users fileupload both to our file storage, and at the same time to OCR. Making us able to do the OCR much faster, seen from the users perspective.

I would call the REST API myself using request but I find myself lost in the code for authentication. Is there a guide anywhere for calling the REST API in a simple way without using the SDK?

stephenplusplus commented 7 years ago

I would call the REST API myself using request but I find myself lost in the code for authentication. Is there a guide anywhere for calling the REST API in a simple way without using the SDK?

The easiest way is probably using google-auto-auth. See the example from the readme, then just plug-in request.

this would make us able to stream a users fileupload both to our file storage, and at the same time to OCR.

Does this mean you would split a single source stream in multiple directions? This can have some side effects-- once a destination stream handles its chunk from the source stream, the source releases it. So, destination streams would have to keep up with each other in order to handle the same data . But, I haven't followed closely any development in this regard-- is there a native/popular solution for this?

We could accept an incoming stream, but I don't believe the response would come any quicker. We send requests through a generated layer, which uses gRPC beneath it. They don't expect any streaming input for this method, so if we received a stream from the user, we would have to buffer the contents into memory before making the request, and then simulate a stream after we got a response by manually emitting data with the sole response object from the API.

This has been how we distinguish between when we offer streams / don't in our API. We want to stay true to how streams work and their benefits, and avoid simulating them.

ebdrup commented 7 years ago

I have the split stream running in production now. It just buffers in memory if there is backpressure. The microservice handling calling google vision just collects it in a buffer for now and calls google vision with the buffer.

What I would potentially do is stream write the beginning of the JSON to the REST api and then stream the file upload directly as base64 into the JSON and then write the close of the JSON. That would mean that the OCR would start almost simultaneously with the fileupload finishing.

Den 27. mar. 2017 kl. 16.04 skrev Stephen Sawchuk notifications@github.com:

I would call the REST API myself using request but I find myself lost in the code for authentication. Is there a guide anywhere for calling the REST API in a simple way without using the SDK?

The easiest way is probably using google-auto-auth. See the example from the readme, then just plug-in request.

this would make us able to stream a users fileupload both to our file storage, and at the same time to OCR.

Does this mean you would split a single source stream in multiple directions? This can have some side effects-- once a destination stream handles its chunk from the source stream, the source releases it. So, destination streams would have to keep up with each other in order to handle the same data . But, I haven't followed closely any development in this regard-- is there a native/popular solution for this?

We could accept an incoming stream, but I don't believe the response would come any quicker. We send requests through a generated layer, which uses gRPC beneath it. They don't expect any streaming input for this method, so if we received a stream from the user, we would have to buffer the contents into memory before making the request, and then simulate a stream after we got a response by manually emitting data with the sole response object from the API.

This has been how we distinguish between when we offer streams / don't in our API. We want to stay true to how streams work and their benefits, and avoid simulating them.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

stephenplusplus commented 7 years ago

We used to do that in various places in this library as well, but it's a little different now with gRPC. I don't believe we have an equivalent way of sending a stream where the generated/gRPC API doesn't expect one.

@jmuk @landrito can you think of a way to get a stream through in the generated layer for the batchAnnotateImages() method? And would it offer any perf benefits (not sure how the gRPC end of this works)?

The goal is; the API request starts once it receives the first chunk of data from a source stream, so that by the time the source stream is drained, the response is practically back. Compare this to currently, where the API request starts after the buffer is drained.

lukesneeringer commented 7 years ago

The caution I would put on that (we sort of alluded to this already) is that we need to deliver what it says on the tin. Pretending that we are streaming a request when, in fact, it is uni-directional and the response does not start until the entire request completes, is pointless.

landrito commented 7 years ago

I don't think the goal you stated can be accomplished without a streaming rpc being added to the proto and the corresponding method implemented server side. Maybe @jmuk has an idea?

ebdrup commented 7 years ago

@lukesneeringer Even if you buffer and then do OCR on your side, a streaming option would still be nice. Then we can stream a file from the file system, a web request etc. And not have to buffer the entire thing in memory on our side. In our experience these kind of streaming solutions scale exceptionally well under heavy load.

stephenplusplus commented 7 years ago

I don't think we'd go so far as to fake the stream. The problem with doing it on our side is that it would create an illusion that we're doing something to help the user's app's performance. It might encourage them to do things they otherwise wouldn't, not realizing it was all a facade until they notice their memory usage going up inexplicably. Eventually, they'd realize what was going on, and at that point, they'd probably make us feel pretty bad for doing that to them.

I would recommend buffering on the app's side, even if it is an extra step. concat-stream can make it pretty easy:

function requestHandler(req, res) {
  req.pipe(concat(function(userImageUpload) {
    vision.detectText(userImageUpload, function(err, detections) {})
  }))
}

I'll close the issue for now, since I think we're trapped by the proto files not allowing a streaming version of this method. If @jmuk has any ideas and thinks there is hope here, please re-open.

ebdrup commented 7 years ago

Oh I didn't mean buffering in the SDK. I meant buffering on the google servers.

googleapis / google-cloud-node

Support for stream in detectText #2138