googleapis / gax-nodejs

Google API Extensions for Node.js
Apache License 2.0
117 stars 87 forks source link

Refine "page-streaming" interface yet again #60

Closed jmuk closed 7 years ago

jmuk commented 7 years ago

We got some internal feedback about the return value for page-streaming. Current thing (an object Stream) would not be quite intuitive nor easy-to-use for many NodeJS programmers.

Here are suggestions:

Returns a promise which resolves to an Array of results

api.pagedMethod(req).then(function(results) {
  for (var i = 0; i < results.length; ++i) {
    var result = results[i];
    ...
  }
});

The promise will resolve to a response object when 'autoPaginate' is specified to false

This means we are going to rename flattenPages to autoPaginate (autoPaginate is the name google-cloud-node developers chose, and would be better to be consistent with this rather than the name used in other languages).

This would also be better that the sample code fetches the next page.

api.pagedMethod(req, {autoPaginate: false}).then(function(response) {
  console.log(response);
  return api.pagedMethod(req, {autoPaginate: false, pageToken: response.nextPageToken});
}).then(function(response) {
  // the next page.
  console.log(response);
});

(also -- probably better to specify a bit more call options here, like pageSize).

Generate another method with -Stream suffix for returning stream

This might be optional, but a different method to return a stream instead of Promise.

api.pagedMethodStream(req)
  .on('data', function(result) {
    // invoked for each result.
  }).on('error', function(err) { ... })
  .on('end', function() { ... });

Stream method with autoPaginate: false will generate an object Stream of response objects

api.pagedMethodStream(req, {autoPaginate: false})
  .on('data', function(response) {
    // invoked for each page.
  }).on('error', function(err) { ... })
  .on('end', function() { ... });
jmuk commented 7 years ago

cc: @jmdobry, @landrito, @bjwatson, @stephenplusplus, @callmehiphop

callmehiphop commented 7 years ago

Overall this LGTM, this is pretty much how we handle pagination in google-cloud-node. My only comments are around streaming with autoPaginate: false. I don't think it's a good idea to change the data being passed back based on an option. Instead maybe we should create a separate event so users can grab the nextPageToken

api.pageMethodStream(req, { autoPaginate: false })
  .on('error', console.error)
  .on('data', function(result) {
    // invoked for each result
  })
  .on('response', function(response) {
    // make additional queries here.. or not
  })
  .on('end', function() {});
jmuk commented 7 years ago

Note: there might be some concern for building an Array for entire list of elements. It consumes some amount of memory, it won't finish until the API call reaches to the end, etc.

Here are related things in JS which I reviewed but I concluded not fitting into this use cases.

iterables and generators

ES6 generators are not yet in NodeJS 0.12 which we are supporting. Also, their APIs are synchronous -- their results need to be fetched synchronously, which isn't our use case.

RxJS

Reactive Extension (Rx) handles the pattern of asynchronous sequence of elements, which could fit into this, and I like it honestly. However, that isn't a standard nor widely accepted by nodejs community. It needs some training for JS programmers to use that.

It would be great to provide some composable unit so that user can convert the result quickly to RxJS if they prefer, but this won't be the interface everyone will see.

jmuk commented 7 years ago

@callmehiphop

Thanks for replying so quickly!

Actually @jmdobry told me about "Stream with autoPaginate: false" pattern as for what is currently in google-cloud-node. I want to know what's in google-cloud-node right now.

I personally don't like the idea of adding another 'response' event for this purpose. 'data' event is good because it's documented in the standard documentation https://nodejs.org/api/stream.html#stream_event_data and can be composed with others (like pipe method only cares about data event).

If it is confusing or harder to understand, simply disallowing (or ignoring) autoPaginate for -Stream method might be acceptable.

callmehiphop commented 7 years ago

It would be great to provide some composable unit so that user can convert the result quickly to RxJS if they prefer, but this won't be the interface everyone will see.

I'm not super familiar with RxJS but it looks like the fromEvent works with EventEmitter like objects. I'm guessing something like this could work?

var stream = api.pageMethodStream(req);

Rx.Observable.fromEvent(stream, 'data');

Actually @jmdobry told me about "Stream with autoPaginate: false" pattern as for what is currently in google-cloud-node. I want to know what's in google-cloud-node right now.

IIRC when in stream mode we force autoPaginate to true. Instead we tell users to call this.end() if they want to end the stream early.

jmuk commented 7 years ago

@callmehiphop

I'm not super familiar with RxJS but it looks like the fromEvent works with EventEmitter like objects. I'm guessing something like this could work?

Yeah, I think so.

For streaming, I'll wait for @jmdobry's reply, but what you wrote looks good to me.

jmdobry commented 7 years ago

On RxJS, yeah using fromEvent is good.

On autoPaginate and streaming:

IIRC when in stream mode we force autoPaginate to true. Instead we tell users to call this.end() if they want to end the stream early.

SGTM, though I'd think pageSize would be configurable. And, the value passed to the data event handler should be the same type/structure regardless of options passed to the streaming method.

callmehiphop commented 7 years ago

though I'd think pageSize would be configurable

My mistake, that is totally configurable. By default we autoPaginate and if the user does not specify the pageSize or maxResults we make requests until we run out of results. Where as with pageSize/maxResults set we stop making requests once that number has been fulfilled.

jmuk commented 7 years ago

Two PRs are created, please take a look if you're interested in the implementation details. The API surface respects the conclusion here.

I'll regenerate files after those PRs are merged.

bjwatson commented 7 years ago

My main comment is to only use "stream" for gRPC streaming. I think other language teams are calling this "paged iteration" (cc @garrettjonesgoogle, @michaelbausor, @pongad).

jmuk commented 7 years ago

The -Stream suffix is used because it returns a "Stream" which is builtin to the NodeJS runtime. I think other suffix won't make much sense here. Otherwise, the term word 'stream' is not used.

I've updated the comment for -Stream suffixed methods.

bjwatson commented 7 years ago

I see. Are you talking about this mechanism? Are you also using this to support gRPC streaming?

jmuk commented 7 years ago

Right, that's that.

And yes, NodeJS gRPC-streaming uses this Stream class, therefore vkit methods share the same returned type. I hope this is less surprising to users, assuming people understand such "gRPC-streaming" happens asynchronously and have multiple returned objects. We may probably need to revise the doc comment about asynchronousness of those methods.

bjwatson commented 7 years ago

In a sense, the "paged iteration" is also asynchronous, since it might need to go fetch another page from the server. The main difference is how long reads might block.

Another difference is that gRPC also supports streamed writing (for client-side or bidi).