zemirco / couchdb

CouchDB client in Go
MIT License
44 stars 14 forks source link

Iterator for ViewResponse.Rows #7

Open ryanjyoder opened 7 years ago

ryanjyoder commented 7 years ago

It would be great if ViewResponse's (or the View?) could handle the logic of paginating over a view. The ViewResponse could return a chan of Row's that loops over the view automatically. This would help reading a view with a large number of rows.

I'd be willing to submit a PR, if this sounds like a feature you would like.

zemirco commented 7 years ago

Why don't you use query parameters for pagination? Like here http://docs.couchdb.org/en/2.0.0/couchapp/views/pagination.html

ryanjyoder commented 7 years ago

I would like that logic abstracted away, instead of making multiple calls to view.Get. Something like this:

viewIter, err := db.View("design-view").Iter("view", params) 
for row := range viewIter {
    // do something with row
}

Maybe I'm missing something simple, but it seems like calling Get several times could be error prone, and a common usage.

zemirco commented 7 years ago

I'm afraid I don't understand. Let's say you've got 100 documents.

You can either get 10 documents 10 times, i.e. making 10 GET requests.

Or you can make a single requests to get all 100 documents at once. Then you can loop over your result array.

Where do you want to have your iterator? CouchDB doesn't stream view query results or anything like that. It's always request -> response.

ryanjyoder commented 7 years ago

Yes since couchdb doesn't support streaming that's why this feature would be so helpful. Something like this is what I'm thinking.

// Iter returns a chan of Rows.
func (v *View) Iter(name string, params QueryParameters) (chan *Row, error) {
    rows := make(chan *Row)
    i := 0
        if params.Skip != nil {
                i = *params.Skip
        }
    limit := 5
        if params.Limit != nil {
                limit = *params.Limit
        }
        params.Limit = &limit
        params.Skip = &i
    resp, err := v.Get(name, params)
    if err != nil {
        return nil, err
    }
    go func() {
        for {
            for _, row := range resp.Rows {
                rows <- row
                i = i + 1
            }
            if i >= resp.TotalRows {
                close(rows)
                return
            }
            params.Skip = &i
            resp, err = v.Get(name, params)

        }
    }()

    return rows, nil
}

I basically have to duplicate this logic every time I read a view. The iter could be a bit smarter too by using better values for limit, and maybe using 'startkey' instead of skip.

zemirco commented 7 years ago

What do you want to do with your rows that come out of the channel?

And why do you want to have a channel? There is no need for async actions. You already have all the results available, you don't need to pipe them through a channel. You're basically converting an array of results into a channel of results. Why not simply loop over the array of result?

And you're now using limit and skip on the client side. Let the server, i.e. CouchDB, do the hard work of limiting and skipping.

Edit: I think I understand your problem. Let me think about it 😄

ryanjyoder commented 7 years ago

Maybe it wasn't clear from the code, but one major advantage of sending them over a chan is that the full data set is never in memory. If I read 10k rows, for example, I will never have more than 5 in memory at once. At the same time if I only consume 5 rows and stop reading, the iterator will only make a single rest call.