Closed jmuk closed 7 years ago
cc: @jmdobry, @landrito, @bjwatson, @stephenplusplus, @callmehiphop
Overall this LGTM, this is pretty much how we handle pagination in google-cloud-node. My only comments are around streaming with autoPaginate: false
. I don't think it's a good idea to change the data being passed back based on an option. Instead maybe we should create a separate event so users can grab the nextPageToken
api.pageMethodStream(req, { autoPaginate: false })
.on('error', console.error)
.on('data', function(result) {
// invoked for each result
})
.on('response', function(response) {
// make additional queries here.. or not
})
.on('end', function() {});
Note: there might be some concern for building an Array for entire list of elements. It consumes some amount of memory, it won't finish until the API call reaches to the end, etc.
Here are related things in JS which I reviewed but I concluded not fitting into this use cases.
ES6 generators are not yet in NodeJS 0.12 which we are supporting. Also, their APIs are synchronous -- their results need to be fetched synchronously, which isn't our use case.
Reactive Extension (Rx) handles the pattern of asynchronous sequence of elements, which could fit into this, and I like it honestly. However, that isn't a standard nor widely accepted by nodejs community. It needs some training for JS programmers to use that.
It would be great to provide some composable unit so that user can convert the result quickly to RxJS if they prefer, but this won't be the interface everyone will see.
@callmehiphop
Thanks for replying so quickly!
Actually @jmdobry told me about "Stream with autoPaginate: false" pattern as for what is currently in google-cloud-node. I want to know what's in google-cloud-node right now.
I personally don't like the idea of adding another 'response' event for this purpose. 'data' event is good because it's documented in the standard documentation https://nodejs.org/api/stream.html#stream_event_data and can be composed with others (like pipe
method only cares about data
event).
If it is confusing or harder to understand, simply disallowing (or ignoring) autoPaginate
for -Stream method might be acceptable.
It would be great to provide some composable unit so that user can convert the result quickly to RxJS if they prefer, but this won't be the interface everyone will see.
I'm not super familiar with RxJS but it looks like the fromEvent
works with EventEmitter
like objects. I'm guessing something like this could work?
var stream = api.pageMethodStream(req);
Rx.Observable.fromEvent(stream, 'data');
Actually @jmdobry told me about "Stream with autoPaginate: false" pattern as for what is currently in google-cloud-node. I want to know what's in google-cloud-node right now.
IIRC when in stream mode we force autoPaginate
to true. Instead we tell users to call this.end()
if they want to end the stream early.
@callmehiphop
I'm not super familiar with RxJS but it looks like the fromEvent works with EventEmitter like objects. I'm guessing something like this could work?
Yeah, I think so.
For streaming, I'll wait for @jmdobry's reply, but what you wrote looks good to me.
On RxJS, yeah using fromEvent
is good.
On autoPaginate
and streaming:
IIRC when in stream mode we force autoPaginate to true. Instead we tell users to call this.end() if they want to end the stream early.
SGTM, though I'd think pageSize
would be configurable. And, the value passed to the data
event handler should be the same type/structure regardless of options passed to the streaming method.
though I'd think pageSize would be configurable
My mistake, that is totally configurable. By default we autoPaginate
and if the user does not specify the pageSize
or maxResults
we make requests until we run out of results. Where as with pageSize
/maxResults
set we stop making requests once that number has been fulfilled.
Two PRs are created, please take a look if you're interested in the implementation details. The API surface respects the conclusion here.
I'll regenerate files after those PRs are merged.
My main comment is to only use "stream" for gRPC streaming. I think other language teams are calling this "paged iteration" (cc @garrettjonesgoogle, @michaelbausor, @pongad).
The -Stream suffix is used because it returns a "Stream" which is builtin to the NodeJS runtime. I think other suffix won't make much sense here. Otherwise, the term word 'stream' is not used.
I've updated the comment for -Stream suffixed methods.
I see. Are you talking about this mechanism? Are you also using this to support gRPC streaming?
Right, that's that.
And yes, NodeJS gRPC-streaming uses this Stream class, therefore vkit methods share the same returned type. I hope this is less surprising to users, assuming people understand such "gRPC-streaming" happens asynchronously and have multiple returned objects. We may probably need to revise the doc comment about asynchronousness of those methods.
In a sense, the "paged iteration" is also asynchronous, since it might need to go fetch another page from the server. The main difference is how long reads might block.
Another difference is that gRPC also supports streamed writing (for client-side or bidi).
We got some internal feedback about the return value for page-streaming. Current thing (an object Stream) would not be quite intuitive nor easy-to-use for many NodeJS programmers.
Here are suggestions:
Returns a promise which resolves to an Array of results
The promise will resolve to a response object when 'autoPaginate' is specified to false
This means we are going to rename
flattenPages
toautoPaginate
(autoPaginate
is the name google-cloud-node developers chose, and would be better to be consistent with this rather than the name used in other languages).This would also be better that the sample code fetches the next page.
(also -- probably better to specify a bit more call options here, like
pageSize
).Generate another method with -Stream suffix for returning stream
This might be optional, but a different method to return a stream instead of Promise.
Stream method with autoPaginate: false will generate an object Stream of response objects